JOURNAL OF GEOPHYSICAL RESEARCH, VOL. 104, NO. D6, PAGES 6199-6213, MARCH 27, 1999 A comparisonof paired histogram,maximum likelihood,class elimination, and neural network approachesfor daylight global cloud classificationusingAVHRR imagery T. A. Berendes, • K. S. Kuo,• A.M. Logar, 2E. M. Corwin, 2R. M. Welch, • B. A. Baum, 3A. Pretre, 4andR. C.Weger s Abstract. The accuracyand efficiencyof four approachesto identifyingcloudsand aerosolsin remotesensingimageryare compared.Theseapproaches are asfollows: a maximumlikelihood classifier,a pairedhistogramtechnique,a hybridclasseliminationapproach,anda back- propagation neuralnetwork. Regionalcomparisons wereconducted on advancedvery high resolutionradiometer(AVHRR) local areacoverage(LAC) scenesfrom the polar regions,desert areas,andregionsof biomass-burning, areaswhichare knownto be particularlydifficult. For the polar,desert,andbiomassburningregions,the maximumlikelihoodclassifierachieved9497% accuracy,the neuralnetworkachieved95-96% accuracy,andthe pairedhistogramapproach achieved93-94% accuracy. The primaryadvantageto the classeliminationschemelies in its speed;its accuracyof 94-96% is comparableto thatof the maximumlikelihoodclassifier. Experimentsalsoclearlydemonstrate the effectiveness of decomposing a singleglobalclassifier into separateregionalclassifiers,sincethe regionalclassifierscanbe morefinely tunedto recognizelocal conditions. In addition,the effectiveness of usingcompositefeaturesis comparedto the simplerapproachof usingthefive AVHRR channelsandthereflectanceof channel 3 treated as a sixth channel as the elements of the feature vector. The results varied, demonstrating thatthefeaturescannotbe chosenindependently of theclassifierto be used. It is alsoshownthat superiorresultscanobtainedby trainingthe classifiersusingsubclass informationandcollapsingthe subclasses afterclassification.Finally, ancillarydatawere incorporated into the classifiers,consisting of a land/watermask,a terrainmap,anda computed sunglintprobability.While the neuralnetworkdid not benefitfrom thisinformation,the accuracyof the maximumlikelihoodclassifierimprovedby 1%, andthe accuracyof thepaired histogrammethodincreasedby up to 4%. 1. Introduction Identifyingcloudsand aerosolsin remote sensingimageryis an important first step in the retrieval of both surfaceand atmospheric properties,as well asin estimatingradiativeforcing for climatestudies. The traditionalapproachto this problemhas been to use a seriesof thresholdtestsrelated to spectralcontrast [d'Entermont,1986; Inoue, 1987; Prabhakara et al., 1988; Key and Barry, 1989], radiance spatial contrast [Arking, 1964], radiancetemporalcontrast[Reynolds and VonderHaar, 1977; Minnis and Harrison, 1984], radiancespatial variancecontrast [Coakleyand Bretherton,1982; Coakleyand Baldwin, 1984], and radiancetemporalvariancecontrast[Gutmanet al., 1987]. •Departmentof Atmospheric Sciences, GlobalHydrologyand ClimateCenter,Universityof Alabamain Huntsville. Department of Mathematics andComputer Science, SouthDakota Schoolof MinesandTechnology,RapidCity. 3Atmospheric Sciences Division, NASALangleyResearch Center, HamptonVirginia. nMartinandAssociates, Inc., Mitchell,SouthDakota. •Institute ofAtmospheric Sciences, SouthDakotaSchoolof Mines andTechnology,RapidCity. Copyright1999by theAmericanGeophysical Union. Papernumber98JD02584. 0148-0227/99/98JD-02584509.00 6199 This paper compares the accuracy and efficiency of four alternative approachesto the cloud identification problem: a maximum likelihood classifier, a new paired histogram technique, a hybrid class elimination approach,and a backpropagationneuralnetwork. There are currentlytwo operationalglobalcloudclassification schemes,both of which are based upon thresholdtests. The InternationalSatellite Cloud ClimatologyProject (ISCCP) cloud maskingalgorithmis describedby Rossow[1989], Rossowand Garder [1993], Rossowet al., [1989a, b], and Sezeand Rossow, [1991]. The ISCCP algorithmis basedon the premisethat the observedvisible and infrared radiancesare causedby only two typesof conditions,"cloudy"and "clear," and that the rangesof radiancesand the variability that is associatedwith these two conditionsdo not overlap [Rossowand Garder, 1993]. As a result, the algorithmis basedupon thresholds,where a pixel is classified as "cloudy" only if at least one radiance value is distinctfrom the inferred "clear" value by an amountlarger than the uncertaintyin that "clear"value. The National Oceanic and Atmospheric Administration Clouds from AVHRR (CLAVR) algorithm (phase1) examines multispectral information, channel differences, and spatial differencesand then employsa seriesof sequentialdecisiontree tests [Stoweet al., 1991]. Cloud-free, mixed (variable cloudy), and cloudyregionsare identified for 2x2 global area coverage (GAC) pixel arrays. If all four pixels in the array fail all the cloud tests,then the array is labeled as cloud-free(0% cloudy); if all four pixels satisfyjust one of the cloudtests,then the array 6200 BERENDES ET AL.: COMPARISON is labeled as 100% cloudy. If one to three pixels satisfya cloud test, then the array is labeled as mixed and assignedan arbitrary value of 50% cloudy. If all four pixels of a mixed or cloudy array satisfy a clear-restoraltest (required for snow/ice, ocean specularreflection, and bright desert surfaces),then the pixel arrayis reclassifiedas "restoredclear" (0% cloudy). While cloud detectionover ocean surfacesis relatively well established, it is much more difficult OF CLASSIFIERS USING AVHRR The second question considered is as follows: can the accuracyof global classifiersbe improvedby subdividingthem into specialized regional classifiers? Experiments were performedto test the effectivenessof decomposinga single global classifier into separate regional classifiers,since the regionalclassifierscan be more finely tuned to recognizelocal conditions. over land surfaces. The final question concernswhether the accuracyof the classifiers can be improved by using subclasses. The EOS Clouds and the Earth's Radiant Energy System(CERES) effort does not require intraclassdistinctions. For example, seven categoriesof water cloud have been identified in the satellite scenesconsidered,but the primary task is to identify all of them as water cloud. Thus the questionwas whether it would be more effectiveto use the subclassinformationin designingthe order to be accurate. Alternative methods of cloud detection and classifiersor to consider all seven types of water cloud as a classificationhave been developed to addressthis weakness. singleclass. These methods promise improved accuracy over threshold Section 2 provides detailed information about the data, approaches,especiallyover difficult surfacessuch as ice/snow, classes, and features, and section 3 describes the various deserts,smoke,and sunglintbut at a highercomputationalcost. classificationmethodologies.Section4 presentsthe results,and Therefore the four techniques described here, a maximum section 5 concludes. likelihoodclassifier,a pairedhistogramtechnique,a hybridclass elimination approach,and a back-propagation neural network, 2. Data and Features are intercomparedin terms of both accuracyand computational expense.Thesetechniquesaredescribedin detailin section3. The variousclassifiersare trained and testedusing AVHRR The regional comparisonswere conductedon advancedvery LAC satelliteimagery. The spatialresolutionof the LAC data is high resolutionradiometer(AVHRR) local area coverage(LAC) 1.1 km at nadir. The spectraldata include AVHRR channels1 scenesfrom the polar regions, desert areas, and regions of (0.5-0.68 gm), 2 (0.725-1.1 gm), 3 (3.55-3.93 gm), 4 (10.5-11.5 biomassburning, areas which are known to be particularly gm), and 5 (11.5-12.5 gm), which includevisible,near-infrared, difficult. Three additional data sets were created to test the midinfrared,and infrared window regions. A sixth channelis generalizationabilities of the classifiers. The "global"data set created from the reflectance of channel 3. This channel is containssamplesfrom all areas of the Earth, the "other" set derived by removingthe channel 3 thermal emission,which is containsglobal data minus vectorswhich would fall within the estimatedusing channel4 emissiontemperature[Allen et al., purview of the regional classifiers, and the "twilight" set 1990]. Owing to the fact that the classifiers are trained and containsvectorsfrom the polar regionswhere the solar zenith testedon data spanningthe periodfrom 1985 until 1993 with the angleis between80ø and 85ø. NOAA 9, 10 and 11 sensors,calibration accuracyis essential. In additionto the accuracyand computationalexpenseissues The calibration algorithm is a standardtechniqueand is well relatedto the choiceof the classifierselected,this investigation documentedin the literature [e.g., Brown et al., ; Rao et al., focusesupon three other questions. First, doesthe use of more 1993, Rao and Chen, 1994; Weinrebet al., 1990;Kidwell, 1995]. advancedfeatures(other thanjust the five AVHRR channelsand Prior to their use in the classifier systems,the raw satellite a sixth channel derived from the reflectance of channel 3) channel values are calibrated to physical temperatureand significantlyimproveclassificationaccuracy?It has been shown reflectance values. Then, the calibrated values for each channel that spectral,textural,or combinations of spectraland textural are scaled between 0 and 1.0. Specifically, the temperature featurescan be effective for cloud identification,particularlyin channel value, which ranges from 180øK to 330øK, and the the polar regions[Baurnet al., 1995; Ebert, 1987, 1989; Welch reflectancevalue, which rangesfrom 0 to 100, are scaledfrom et al., 1990, 1992; Key and Barry, 1989; Key 1990; Rabindra et 0.0 to 1.0. al., 1992; Tovinkereet al., 1993]. Thesecategoriesrepresenta large numberof potentialfeatures;thus variousapproaches for 2.1. Sample Selection feature reduction are utilized in order to select a small subset Difficulties in cloud discriminationincreaseprogressivelywith increasingbrightness of the background. In addition, cloud detectiontechniquesare difficult to apply over land becauseof the high spatial heterogeneityof land surfaces, as well as seasonaland regional (including topographic)influences[Eck and Kalb, 1991; Cihlar and Howarth, 1994]. Threshold tests would need to be highly specificto the individualbackgroundin that bestseparates the classes[Richards,1993]. In the present investigation,a large numberof features,composedof linear and nonlinearcombinationsof the six channelsof data, are generated for the training data, and the paired histogramsystemselectsa small, optimal set of featuresfor eachclasspair. In this study, three features, which are optimally separatedand not highly correlated,are selectedfor each pair of classes. A subsetof these features, reduced by tighter divergenceand correlation requirements, is used by the neural network and maximum likelihoodclassifiers.This techniqueis comparedto the simpler approachof usingthe six channelsof data for each pixel as the elementsof the featurevector. Althoughtextural featureshave been effective in other research,they were not consideredhere becauseof the large computationaloverhead. A total of 268,472 samples were selectedfrom 91 AVHRR LAC scenesto createsix data sets. The globaldatasetcontains all of the availabletrainingand testingdata. The regionaldata sets, namely, South America at the time of heavy biomass burning,the polar regions,desertareas,and "twilight" regionsof highsolarzenithangle(©o> 80"), areextracted fromtheglobal data. Another subsetof the global data, labeledthe "other"data, is the global data which falls outsideof the polar, desert,and biomass-burning regions. The dateson which the imagerywas acquiredcan be foundin Table l a. The twilight sampleswere takenfrom the polar and "other"data set, so a separateentryis not given in Table 1a. The data were divided into trainingand testing sets tbr each region. The number of samplesused for BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR 6201 which can be viewed and manipulated outside of S1VIS, if desired. The 268,472 samplesusedin the presentinvestigation were all generatedusingthe SIVIS system. Table la, Datesof ImageryUsed Region Date of Imagery BiomassBurning (SouthAmerica) Polar (BeaufortSea, Aug. - Sept.1985 and Sept. 1986 April - June1987, April - Sept. 1988and 1989, Barent's Sea, June and December 1992, and GreenlandSea,and Baffin Bay) JanuaryandOctober1993 Desert June 1986, (North Africa and the Middle East) Other July 1988, and March,May, andJune198-9 JanuaryandOctober1993 2.2. Classes The eight classeswhich have been identified in the various data setsare listed in Table 2. However, histogramanalysisof the data shows that some of the classes are multimodal of snow/ice. trainingand testingcan be foundin Table lb. It is importantto note that the training and testing samples were taken from independentdata sets,that is, from differentregionsor from the same region but on different days or different years. It is a standardpractice to extract both training and testing samples from the same orbital swaths, but while this improves the reportedaccuracyof the classifier,it also providesa distorted estimateof classificationaccuracybecausethe data sets are not independent. The independenceof the trainingand testingsets was an importantrequirementtbr this researchand shouldbe a considerationin the evaluationof any classificationresults. Accurate labeling of training and testing samples is an importantfirst stepin the classificationprocess.An experienced analyst was aided in this processby the Satellite Imagery Visualization System (SIVIS). SIVIS was developed by V. Tovinkere to run on Silicon Graphics workstations. It is composedof a set of displaytools for visualizationand analysis of satellite data. Plate 1 showsa typical SIVIS session. Tools providedto the analystincludehistogramequalization,contrast stretch,gray scaleinversion,and a spatialcoherenceplot. The analyst can display channel, channel ratio, and channel difference values along with ecosystem,sunglint, and other ancillaryinformation. Labeledsamplesare storedin a database The subclasses also are listed in Table 2. Note that some of the classesmay not be representedin the trainingset for a region. For example,desertsare not foundin polar scenes,and the snow/iceclassgenerallyis not foundin the desert scenes. 2.3. Feature Generation There is considerableuncertaintyin the literature as to the selectionof an optimumset of featuresfor use in classification, and there is an entire subdisciplinedevotedto feature reduction [Richards, 1993]. Some authors simply utilize the various sensorchannels,while othersform setsof higher-orderfeatures, such as channel differences, channel ratios, and normalized channel difference ratios (e.g., the normalized difference vegetationindex). However, many of these featuresmay be highly correlated or may provide little discriminatory information. While it is generally acceptedthat higher-order featuresdo provide somewhathigher accuracyin classification, there is little information concerningthe costs and benefits. Sincethere is no reliable guidanceas to an optimumselectionof features, the present investigationhas taken the approachof examining a very extensive number of features for possible incorporationinto the classificationprocess. The 185 features generatedin this studyare listed in Table 3. It shouldbe noted that although185 featuresare considered, the feature selection phase selects a much smaller subset of Table lb. Number of samplesusedfor eachclassin eachdata set Data Scenes Water Set in some of the features,indicating the presenceof subcategoriesof that class. For example, snow-coveredmountain is a subclassof snow/ice,displayinga distinct secondarypeak in the histogram; therefore,snow-coveredmountainsmay be considereda subclass Snow/ Ice Ice Cloud Land Water Cloud Desert Sunglint Haze / Smoke Polar Train 33 1644 10408 5979 1233 19199 3284 Test 21 1286 7144 4754 1302 12431 708 Train 45 11366 8292 10294 7464 9733 2808 6132 Test 40 9806 5811 5911 4123 4337 1176 4204 Train 47 13101 28541 19515 2640 1880 8143 2509 1103 6440 37 1143 776 5459 Test 1497 2601 1648 440 South America Desert Other Train 39 6917 12580 17208 26423 1272 2304 Test 25 4468 6596 9255 15789 1304 672 Train 55 12982 12971 27608 25462 55014 6517 2172 8220 Test 36 8190 5429 21354 22594 38915 2524 1812 4896 Global Twilight Train 18 620 1272 1240 432 3676 Test 16 488 404 712 240 2452 6202 BERENDES • SIVIS' .t¸: ½ont ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR 1tJt S IIm ry V•sui• e •on$y' I' ! lnfm-nu 'im• •ampk Mode •' e III 1. l•: 11ttC's T I •1•o '1 , L 131 t•o $1Vt• 1, t ßm fm• t• .t9 # Spatial Coherence [•]con! •tStretch[•lCOntT• r lth 4 BId(lOS- nit.tSirth 115) Plate 1. Sample Satellite ImageryVisualizationSystem(SIVIS) session.The image in the upper left-hand cornerlabeled "Tile Image" is taken over Norway and Swedenand is displayedas a three-bandoverlay. It is a sectionof the largersceneshownin the lower left-handcomerlabeled"Snapshot Toolbox." BERENDES Table ET AL.: COMPARISON 2. List of Classes and Sub-classes Class Class Index Label 1 Water 2 Snow/ice Subclasses Snow/ice Broken Sea Ice Snow-covered 3 Ice cloud 4 Land (nondesert) 5 Water cloud mountains thin water cloud over land water cloud over desert stratus over land thin watercloudin polarregions stratus over OF CLASSIFIERS 17 Sunglint channel I < 11% channel 1 > 11% 27 smoke over land Haze 6203 256 oceans cumulus over oceans Desert AVHRR dimensionalityproblems. However,one set of featuresmay be mostvaluablefor distinguishing betweentwo particularclasses, while an entirely different set of featuresmay be optimal for separatingtwo otherclasses. The goal of the pairedhistogrammethodis to find a set of features for each pair of classes which provides optimal separationbetween the pair. For each feature (F), the normalizedhistogramsIF and JF are createdfor classesi and j, respectively.The two histograms are scaledto theminimumand maximum of their combined range for that feature and are discretizedinto 256 bins. The histogramsof eachpair of classes are compared,and two measures,overlap(O) and divergence (DIV), are computedasfollows: cumulus over land 14 USING smoke over oceans dust/aerosols over land dust/aerosols over oceans featureswhich are effective for separatingclasses. Generally, only about 10 - 40 features are selected by the various classifiers. Note that features102 - 161 are generatedusingthe hue, saturation,value transformgiven in [Foley et al., 1990]. This transform uses three channel values, or channel differences, to generatea set of three new numerical features which are representative of the red, green,and blue colorcombinations an analystusesto determineclassmembership. O(F)ij =x•=l IF(x)J F(x) (1) DIV(F)ij = Igi- pjl/(IJi q-IJj) (2) where [ti and gj are the means of F for classesi and j, respectively,IJiand•i are the standarddeviationsof F for classes i andj, respectively,and x is the histogrambin. This processis repeatedfor eachfeature. The result is that for eachclass,185 histogramscorresponding to the 185 featuresare generated.The divergenceand overlapmeasuresare computedfor eachpair of classes. For each pair of classes,the featuresare sortedby largest divergence. Featureswith equal divergencevalues then are sortedby smallestoverlapvalue. However, somefeaturesmay contain redundant information. Therefore the following procedureis adopted. For eachpair of classes,the first feature in the sorted list is chosen. Next the correlation between the first feature and successivefeatures in the list is computed. Only the training vectors from the two classes under 2.4. Feature Selection considerationare usedin the correlationcomputation. Featureswith correlationsgreaterthan0.9 are discarded.The Most classifierscan only utilize a relatively small numberof features. Examplesare the parallelepiped,minimum distance, next feature in the list with a correlationless than or equal mahalanobis,maximum likelihood and back propagationneural to 0.9 is accepted. The final feature selectedfrom the list must networkapproaches [Richards1993]. For thosemethods,it is satisfythe conditionthat the correlationbetweenit and eachof not feasible to use more than 10-20 features because of the other two featuresis lessthan or equal to 0.9 The resulting Table 3. Explanationof Features Explanation Feature 1-6 7-21 22- 36 37- 51 52 - 66 six calibratedchannelvaluesof AVHRR channels1, 2, 3, 4, and 5 and the reflectanceof channel3 (6) 15 channelratios,A/B, where A and B are the channelslisted above, that is, chl/ch2, chl/ch3, chl/ch4, chl/ch5, chl/ch6, ch2/ch3,ch2/ch4, ch2/ch5,ch2/ch6,ch3/ch4,ch3/ch5,ch3/ch6,ch4/ch5,ch4/ch6,ch5/ch6 15 channel differences,A- B 15 valuesof arctan(A/ B) 15 2-D Euclideandistancesfor channelsA and B, that is, Lt¾1/ij 67 - 86 102- 87- 101 161 162- 185 20 3-D Euclideandistancesfor channelsA, B, and C 15 normalizeddifferences(A-B)/(A+B) 20 setsof hue,saturation,andvalue(HSV) for channelcombinations of A, B, andC (noteeachcombination of A, B, andC produces three featuresfor a total of 60 features) hue, saturation,andvaluefor eachof the following(24 features): [chl, chl-ch4, ch6], [chl-ch2,chl-ch4, chl-ch6], [ch2-ch3,ch3-ch4,ch4-ch5], [ch2- ch3,ch3 - ch4, ch4- ch6],and[chl - ch2,ch2- ch3,ch3 - ch4], [ch3-ch4,ch4-ch5,ch3-ch5],[chl-ch2, chl-ch3, chl-ch4],[chl, ch4-ch5,ch6] AVHRR, advanced veryhighresolution radiometer; ch,channel; 2-D, two-dimensional; and3-D three-dimensional 6204 BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR three features, F1, F2, and F3, are those that provide a high degreeof separationbetweenthat pair of classeswith the least redundancy.These features,or a subsetdescribedin section2.5, are usedby all the classifiers. Once n is found 2.5. Ancillary Data used, with the assumptionof isotropy,to estimate the percent relativeprobabilityof sunglint,that is, n = (nx,ny,nz) where nz gives the cosineof the angle betweenn and the local zenith,the distributionfoundby Cox and Munk, [1954] can be In addition to the features described in section 2.4, three other types of informationare available to the classifiers:a water mask, terrain maps, and a sunglint probability. The water 100% xexp[--( )2 2 8.5 percentageis taken from the Fleet Numerical Oceanography Center [1992] (fnocwat.imgfile) on and the National Oceanic where and AtmosphericAdministration- EnvironmentalProtection © = cos-•nz (4) Agency, [1992]. This file is in raster format with each pixel representing a 10-minby 10-minregionon the Earth. It is used and 8.5 (degrees)is interpolatedfrom Cox and Munk, [1954, to determinewhether a pixel is over land or ocean. If the Figure 3]. This probabilityis usedby the classifiersto improve percentageof water in the 10-min by 10-min regionis 100%, the cloud/sunglintdistinction. then the pixel is consideredto be over openocean,and all land classesare eliminated as possibleclasses. Care is taken to accountfor at leastlargeriversandlakesandislands. Likewise, 3. Methodology the 10-min spatial resolutionterrain map is used to identify Four classification schemes are described below: maximum mountainousregions. The snow-covered mountainsubclassis likelihood, paired histogram, class elimination, and a back allowed only over appropriateterrain. Finally, the sunglint propagationneuralnetwork. probabilitymask identifies regionsof potential sunglint. The sunglintclassis not allowedoutsideof theseregions. 3.1. Maximum Likelihood Classifier Early in the investigationit becameclear that sunglintwas easily confused with other classes and was frequently The maximum likelihood technique is among the most misclassifiedas cloud. Thereforea sunglintprobabilitymeasure popularsupervisedclassificationschemesfor classifyingremote was devised to assistin that discrimination. At each pixel, the sensingdata. A descriptionof the techniqueis givenby Duda solarzenithangle,the viewingangle,andthe azimuthdifference and Hart, [1973] and Richards, [1993]. The maximum are already computed. With these angles, the orientation likelihood method uses the probability distributionsof the necessary for the seasurfaceto producespecularreflectionunder featuresto perform classification. As is commonlydone, the current viewing geometry can be calculated. The angles maximum likelihood method used here makes the assumption that the distribution function for each feature is normal. The necessary for thiscalculationare shownin Figure1. We define the following: no is the unit vectorpointingto the probabilitythat a vectorx is in classj, p(cjl x) is estimatedusing Sun, nsis the unit vectorpointingto the satellite,n is the unit Bayes' theorem,that is, normalvectorof the sea surfaceproducingspecularreflection. Using the followingrelations,one can solvefor the components p(xlcj)p(cj) 1• ] of n: p(cjlx)-n.(n o Xns)=0 (3) (5) p(x) 1 where n . no = n . ns =cos(--©) 2 p(xl cj) = Z I to Sun M/2_1/2 1 exp[--I (x- #j )TE• 1(x- #j )1 (6) (2•r) 2.;j 2 to Satellite and [xjis the meanvectorfor classj, E• is the covariancematrix, andthe superscriptT denotesthe transpose. -' ', The feature selection method described in section 2.4 selects and ranks the three best featuresthat distinguishbetween each classpair. For the maximum likelihoodclassifier,only the top fe_aturefor each pair is retained. Given N classes,there are N = N(N-1)/2possible features._However, thesetof selected features is considerably lessthanN because somefeatures are commonto several class pairs. This processtypically yields 20- 70 features, which still results in a very large covariance matrix. Furthermore,some of the featuresin the set may be highly correlated with other features, resulting in a nearly x singularmatrix. To avoidnumericalinstabilityand to reducethe number of features to a more manageablesize, the correlation matrix for this set of featuresis calculated,and highly redundant Figure 'l. Angles needed for the sunglint probability features are deselected. Note that the entire set of training computation.Note that A(I) is the relative azimuthangle, Os is vectorsis used in this process. The final number of features the observationangle, 0,, is the solarzenith angle,and © = cos- varies from 7 to 11 for the different regional and global classifiers described in section 4. nz as defined in the paper. BERENDES ET AL.:COMPARISON OFCLASSIFIERS USINGAVHRR 6205 featureF in classj is denotedby JF. To classifya pixel Z in an 3.2. Paired Histogram Classifier image,thefeaturevectormustfirstbe computed for thatpixel. For eachfeature,F, the pixel featurevalueZF is scaledto the pairedhistogram technique.Thehistograms, whichareusedas rangefor thehistograms IF andJF(e.g.,Figure2). discriminators betweenpairsof classes, determine theresultof a appropriate The feature value is mapped to the appropriate binsof the two ballot accumulationalgorithm. IF andJF. If IF> JF,classi receivesonevote,if IF < Givena classpair i andj, threefeaturesare chosen based histograms upondivergence andoverlap asdescribed in section 2.4. The JF,classj receivesonevote,andif IF = JF,neitherclassreceives histogram for featureF in classi is denoted by IFandthatof a vote. The features described in section 2.4 form the basis of the (A) :::::: ............ :..: ......... 0.0641 (D) Feature I (1-5)/(1 +5) :::::::::::: ....: .:: ...... :...... :............ -:..... :............. ::.::::: ............ :::: .......... :::::::: ........... ::::::-: ......... FirstCanonical Axis :-: ....... : ....... . - _ -i 0 .3.................. (B) 2.41 0.40 -0.92 1417 (E) Second Canonical Axis Feature 2Sat(2, 4,5) 0.088: 0.0494 Eigenvalue = 4.7e-6 0 0?6 (C) Feature 3 Val(2-3,3-4,4-6) 0.0770 _ -0.603 -15.5 Third Canonical Axis 0.0452 I! Eigenvalue =-4.7e-6 ß 55.0 201.0 Water Cloud (6) 3.70 -4.06 Land (4) Figure2. (a)-(c)Thehistograms thatareexamples of threepaired histograms. These histograms plotthe frequency ofoccurrence oftraining vector values forclasses land(4)andwater cloud (6). Histogram inFigure 2a shows theplotfor(chl- ch5)/(chl + ch5).Histogram in Figure 2bshows thesaturation ofchannels 2, 4, and5. Histogram in Figure 2cshows thevalue(intensity) ofchannels 2-3,34, and4-6.(d)-(f)Thehistograms thatare theresult ofperforming thecanonical transformation onacombination ofthethree histograms ofFigures 2a-2c. 6206 BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR For example,if feature1 is (chl - ch5)/(chl+ ch 5) andits further tests between classes 1 and 3, classes 1 and 4, etc., are valuefor pixelZ is 0.3, thenin Figure2 we seethat0.3 falls performed. If class 2 remains,then testsare made between withintherangewherethehistogram for class4, land,is greater classes2 and 3, classes2 and 4, etc., until class2 is eliminated. than for class6, water, and class4 getsone vote. Note that in In the process, classes2, 3, and4 maybe eliminated.Thenthe manycases, neitherclasswill receivea vote. Thisprocedure is next test would be between classes5 and 6, etc., until the list is appliedto all pairsof classes for all features, andthe class exhausted. The resultof this processis that typicallyoneto fourclasses receiving thelargestnumberof votesis declared thewinner.In the caseof ties,the classesnotinvolvedin thetie are eliminated, remain. If only one classremains,then this is declaredthe winner. If two to four classes remain, then these are sent to the andtheprocedure is repeated. maximum likelihood classifier for final processing. However, the maximumlikelihood classifierusedin the final stageis very fast, sincethe multidimensionalspaceis significantlyreduced. The paired histogramapproachdescribedabove used the On rare occasions, it may occurthat all classesare eliminated. combinationof divergence,overlap,and correlationto reducethe Shouldthis occur,then the eliminationprocedurehasfailed and number of featuresconsideredfor classification. An analogous the full maximum likelihood classifier is utilized. Obviously, proceduremay be used to decrease the number of classes this procedure cannotpossiblysucceedif the correctclassis consideredwhen making a classification. A hybrid approach, improperlyeliminated. one that combineselementsof the paired histogramtechnique with the maximum likelihood classifier, was designedfor this 3.3 Class Elimination Approaches 3.4. purpose. Neural Network Classifier The set of potentialclassesis first reducedby performinga canonicaltransformon the histogramsgeneratedfor each pair of classes.The canonicaltransform,describedby Richards[1993], is similar in concept to principal componentanalysis. The principal componenttransformationoften is used to map the image data into a new uncorrelatedvectorspace. It producesa spacein whichthe data havethe largestvariancealongthe first axis, have the next largest variance along a second,mutually orthogonalaxis, and so on. However, this approachis based uponglobalcovarianceof the full datasetandis not sensitiveto The multilayer perceptronnetwork, trained using the backpropagation algorithm, that was implemented for this investigationis describedin many sourcessuchas Rumelhartet al. [1986] and, more recently, in an article by Paola and Schowengerdt [1995]. A singleperceptron, or node,receivesone or many external inputs on weightedinput lines, computesthe weightedsum of the inputs, and generatesan outputwhich is a function of that sum. The computed function generally is nonlinear and continuouswhich producesa mappingfrom the input spaceto the classificationspace. Multilayer perceptron class structure in the data. The canonical transformation offers networkscontainperceptronsarrangedinto an input layer, an an alternativeapproachin which the classeshave the largest output layer and one or two hidden layers. Since arbitrary possibleseparationbetweentheir meanswhenprojectedontothe decision surfacescan be constructedwith two hidden layers, new axis. more than two will not add functionality. In this investigation,a In the present investigation,a canonicaltransformationis singlehiddenlayer provedsufficient. madefor eachclasspair. The histogramsin the left columnof The knowledge of the network is stored in the connection Figure 2 are examplesof three paired histogramsused in the weights which are set to small random numbers when the approachdescribedin section2. These histogramsplot the networkis initialized. During training,the error for a particular frequencyof occurrence of trainingvectorvaluesfor classesland iteration is computed as the aggregate squared difference (4) and water cloud (6). For example, the histogramin Figure between the desired network outputsand the outputsactually 2a showsthe plot for the feature (chl -ch5)/(chl + ch5). The producedby the weightsfor that iteration(t): histogramsin the right columnare the resultof performingthe canonical transformation on a combination of the three histogramsin the left column. The techniquefor combining thesehistogramsis given by Richards[1993]. The graph in Figure2d showsthe eigenvaluefor the first canonicalaxis. Note that the eigenvalueis 58.4 for the graphin Figure 2d and is 4.7e-06 for graphsin Figures2e and 2f. This demonstrates that the maximum class separabilityinformationcan be extracted from the first canonical axis. This was the case for all of the pair wise histograms. The classeliminationproceduremay be describedas follows. Error(t) =« • (Actuali(t) -Desiredi(t)) 2 (7) i where desiredi is the true class for vector i and actuali is the classselectedby the network. The weightsare changedusing the back-propagation learningalgorithm,whichis basedon the gradientdescenterror minimizationtechnique.The changein a weightis determinedby wij(t+1) = wij(t)+ Awij(t) (8) For a givenpixel to be classified,computethe canonicalvalue where corresponding to a classpair, for example, 1 and 2. If that canonicalvalue does not lie within the region defined by the = •+aAw(j(t-1) (9) histogramof class1, then class1 is hereaftereliminatedfrom anyfurtherconsideration.If the canonicalvaluedoeslie within theregiondefinedby class2, thenclass2 is potentiallya correct The constantrl, called the learningrate, was set to 0.1 for all class, and further tests are run. If the canonical value lies experiments. Also note that the secondterm in (9) is a outsideof the histogramsdescribingboth class 1 and class2, momentumterm, addedto speedconvergence.The momentum then both classes are eliminated from further consideration. constantot addsa percentage of the previousweightupdateto This processis continuedwith othernoneliminated classpairs. the current update. An c• value of 0.5 was used in all For example,if class 1 is eliminatedat somestage,then no experiments. AWu(t) r••U BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR 6207 Theseequationsgive weight updatesfor one presentationof a trainingvector. The processof adjustingthe weightsis repeated until the error in the networkstabilizes. This proceduredoesnot guarantee a global minimum and most often finds a local minimum. The network was trainedon the samesamplesas the Afghanistan),(4) "other" (describedas global but excludingthe polar, biomass burning and desert regions), (5) global (combination of polar, SA, desert and other regions), (6) twilight (samplestaken from the polar regions). Note that the classifiersare for daylight conditionsonly, defined for solar other classifiers zenithanglesup to ©,, = 80ø. The twilightclassifieris a special likelihood and used the same features as the maximum classifier. case which specificallytargetssolar zenith anglesin the range 80ø_< ©,, _< 85ø. This is importantbecausevirtually all operationalcloudclassificationschemesare limited to valuesof 4.0 Results ©o less than 75" - 80", becauseof the low reflected radiances This investigation was conductedin order to constructan causedby the low Sun angles. Table 4 shows the summary of overall classification operationalglobal algorithmfor detectingcloudsover all types accuracies and computationtimes to compute 100,000 pixels of surfacesin supportof the EOS CERES ScienceTeam. There using a Silicon Graphics Octane workstation. Two sets of are two primary considerations:accuracyand computational efficiency. However, there are a number of other important experimentsare listed for comparison,one using the generated issues, including the choice of the classifier, whether or not featuresand the other simplyusingthe channeldata as discussed specially designed features enhance classification accuracy, in section 4.2. The results demonstrate that the derived features whetheror not it is beneficialto subdividethe global classifier are more accuratebut require additionalcomputation.Note that into subregions,whether or not the use of ancillary data are thereis only a singleentryfor the classeliminationschemesince beneficial,and whetheror not dividingclassesinto subclasses is it always begins with the full feature set. This classifier beneficial. In each case,resultsare presentedin termsboth of performs the elimination based on the features which best separatepairs of classes. Unlike the neural network, if the classificationaccuracyand computationalcost. As mentionedin section2.1, the labeling of the training and testing samplesis nonlinearcombinationsof featuresare not presentin the feature critical to the performanceof the classifier. It is likely that some set, the class elimination techniquecannotcreate them. Thus small amountof error existsin the samplesas a result of analyst the techniqueis not effective if only the six channelvalues are error. Thus the reportedresults are relative to the accuracyof the humananalyst. 4.1. Choice of Classifier There are a large numberof classificationapproaches which have been reportedin the literatureand, clearly, not all of them are included in this study. The four approachesthat were selectedfor comparisonare the traditional maximum likelihood method(ML), a new paired histogramapproachwhich utilizes used as features. below. Note The results of both sets of tests are discussed also that the results in Table 4 use subclass informationfor training (discussedin section4.4) and ancillary data during classification(discussedin section4.3). The overall accuraciesfor the polar region are 94.3% (ML), 94% (CE), 94.5% (NN), and 92.8% (PH). Table 5 gives the confusionmatricesfor the polar region for the ML, PH, CE, and NN full feature classifiers. The row number indicates the actual class, and the column number indicates the class that the classifier chose (class numbers are featureswhich best separateclasspairs (PH), a hybridpaired explainedin Table 2). For example,in the ML classifierportion histogram/maximumlikelihood approachbased upon class of Table 5, the value 2.1 in column 1 of row 2 indicates that elimination (CE), and a back-propagation neural network (NN). Table 4 showsthe resultsfor eachof the following regions:(1) the polarregions(polewardof 60ø latitude),(2) SouthAmerica (SA) with periodsof intensebiomassburning,(3) desertregions (limited to the Middle East, stretching from Morocco to 2.1% of the class2 pixels were incorrectlyclassifiedas class 1. All four classifiers have high accuraciesfor the water, land, water cloud, and sunglint classes;the main problem is the separationof the snow/iceclassfrom the ice cloud(cirrus)class. The ice cloud class accuraciesare 85.2% (ML), 84.3% (CE), Table 4. Overall AccuracyandComputationTimes for the ML, CE, NN, andPH Methods for Each of the Regionaland Global ClassifiersUsing Both the CompleteSet of Features (Full) and the Smaller Six-Channel Feature Set ML CE Full NN PH Classifier Full Six-ch Full Six -ch Full Six -ch Polar 94.3 94.0 South America Desert 96.8 95.0 94.0 93.7 94.0 94.5 95.4 92.8 86.5 95.8 94.2 94.1 95.8 96.5 95.4 94.4 93.7 90.6 84.7 Other 94.6 Global 89.8 93.4 93.0 96.4 95.0 91.0 89.1 90.4 88.6 92.4 91.2 89.1 Twilight 93.0 81.5 90.1 93.0 93.1 92.1 92.0 88.3 Polar 11.8 10.6 11.0 5.2 1.9 17.0 16.9 South America 10.2 8.3 8.2 4.4 2.7 10.2 10.1 Desert 16.5 9.1 9.3 7.1 2.0 13.2 13.3 Other 12.4 8.9 9.5 5.9 3.4 13.5 13.1 Global 12.4 6.8 7.7 8.5 3.5 14.5 14.3 Accuracy,% Computational Time, s 6208 BERENDES ET AL.: COMPARISON 1 2 3 I 98.9 0.1 2.1 0.0 0.4 0.0 0.0 92.1 12.2 0.0 1.5 0.0 95.0 3.0 2 3 4 5 17 1.6 0.0 0.0 0.0 0.0 84.9 9.1 0.3 1.1 0.0 1 2 97.8 1.7 0.1 92.3 3 4 5 17 0.0 0.1 0.0 0.0 12.8 0.8 1.4 0.0 I 2 3 4 5 17 99.0 3.9 0.0 0.3 0.0 0.0 0.0 89.2 8.9 0.4 1.2 0.0 4 5 17 0.0 0.0 0.9 0.1 2.3 85.2 0.0 0.1 0.0 0.0 0.0 97.8 0.3 0.0 3.5 2.6 1.8 97.9 0.0 0.1 0.0 0.0 0.3 100.0 0.0 0.0 1.7 0.3 9.8 88.5 0.0 0.4 0.0 0.0 0.0 97.2 0.5 0.0 3.7 2.4 2.1 97.9 0.0 0.0 0.0 0.3 0.2 100.0 0.0 2.0 0.0 0.0 2.0 3.9 0.1 0.0 84.3 0.0 0.0 0.0 0.0 95.7 0.3 0.0 2.9 3.3 97.8 0.0 0.0 0.2 0.5 100.0 0.9 0.0 0.0 96.9 0.2 5.1 0.0 2.2 0.3 2.2 98.2 0.4 0.2 1.0 0.0 0.2 0.3 94.5 MLClass•er 2 3 4 5 17 PH •ass•er I classifiers AVHRR confused some of the smoke/haze NNc•ss•er Table Class I 2 I 99.9 0.0 0.0 2 0.2 98.5 1.1 3 0.0 0.2 classifiers, but for the fundamental task of 4 0.1 identifyingcloudpixels in the polar region,the NN approachis marginally superior. A sample sceneis included in Plate 2 whichdepictsthe classification madeby the ML, NN, and PH 5 0.0 17 the polar regions,are underdetected with all classifiers. The converseconfusion,snow/iceinterpretedas cloud,alsooccurs6 - 13% of the time. The overall accuraciesare comparablefor the and NN thin cloud. 6. Confusion Matrices in South America 5 17 27 0.0 0.0 0.1 0.0 0.1 0.1 0.0 0.0 99.5 0.0 0.3 0.0 0.0 0.0 0.0 96.9 0.2 0.0 2.7 1.3 0.4 0.7 96.4 1.0 0.1 0.2 0.0 0.0 0.0 2.6 97.2 0.0 27 0.2 0.0 0.0 9.8 5.3 1.1 83.6 90.8% (NN), and 88.5% (PH), with the major misclassification as snow/ice. This meansthat ice clouds,which are prevalentin ML with Clearly, a more robust haze/smokealgorithm is needed for biomass-burning regions. For this region, the ML approachis superior. It exhibits better performancefor the haze/smoke pixelsanddoesnothavedifficultyidentifyingsunglint.Sunglint pixelsweredifficultfor thenetworkin everyregionstudied. Overall accuraciesfor the desert region were 95% (ML), 94.2% (CE), 95.8% (NN), and 93.7% (PH). Corresponding confusionmatricesfor each or the classifiersare given in Table 7. The brightdesertregionsare confusedwith cloudby boththe ML and PH classifiers and the with smoke/hazeby the PH classifier. The NN has very high accuracy(99.4%) for these pixels. Water clouds,which are muchmore infrequentin this data set than in otherregions,were a sourceof errorfor all four classifiers. The accuracieswere 84% (ML), 83.4% (CE), 74.8% (PH) and67% (NN). In this region,sunglintis misclassified by CEclass•er 0.0 3.6 90.8 0.0 0.1 0.0 USING distinguish between water cloud and ice cloud to ensure accuracy.Thosecloudsare,in fact, waterclouds. For South America, overall accuracies of 96.8% (ML), 95.8% (CE), 94.1% (NN), and 94.4% (PH) were achieved (Table 4). Corresponding confusionmatricesfor each of the classifiersare given in Table 6. The primary difficulty is haze/smoke,which is prevalentin the data set chosen,with 83.6% (ML), 82.9% (CE), 77%(PH), and 75% (NN) accuracies. The majority of the errors occur when smoke/haze is misclassifiedas land becauseof the spectralsignatureof the underlyingsurfaceshowingthroughthe haze. Similarly, all the Table 5. ConfusionMatrices in the Polar Region Class OF CLASSIFIERS 3 4 MLClass•er PHC•ss•er classifiers. The scene includes land, ice/snow, thin water cloud, multilayer water cloud, and ice cloud. Note that both the multilayerwater cloud,which can be seenin the lower right of I 97.5 0.0 0.0 0.0 0.0 2.5 0.0 2 3 0.0 0.0 98.6 0.6 0.5 99.1 0.0 0.0 0.8 0.3 0.0 0.0 0.0 Plate 2, and the thin water cloud, which covers the central 4 0.1 0.0 0.0 93.1 0.6 0.7 5.5 5 17 27 0.0 0.2 0.0 1.3 0.0 0.0 0.6 0.0 0.0 1.1 0.0 14.3 93.5 4.2 6.4 3.3 95.7 2.3 0.3 0.0 77.0 I 97.9 0.0 0.0 0.0 0.0 2.1 0.0 2 0.0 96.9 0.5 0.6 1.3 0.1 0.7 3 0.0 0.2 99.3 0.0 0.1 0.0 0.4 4 5 0.1 0.0 0.0 1.2 0.0 0.4 97.0 0.7 0.2 95.6 0.0 2.0 2.7 0.1 portionof the image, are both classifiedas water cloud. The resultsare similar for all three classifiers;however,the problem notedabovecan be seenin the upperleft cornerof the classified images.The specksof white containedwithin the pink ice cloud are ice cloud pixels misclassifiedas ice/snow. As mentioned previouslyin this section,theneuralnetworkhasthefewestsuch misclassifications. Problems with ice cloud versus ice/snow motivatedanotherset of experimentswhich demonstratedthat the ice cloud versus ice/snow ambiguity can be removed by augmentingthe data set. Using SIVIS, ice cloudpixels which 0.1 CEclass•ier 17 0.2 0.0 0.0 0.0 2.6 97.3 0.0 27 0.0 0.0 0.0 10.3 5.2 1.7 82.9 scenes,relabeled with the correct classification, and added to the 1 98.5 0.0 0.0 0.0 0.0 1.4 0.0 trainingset. Selectingand addingsamplesto fine tuneregional classifiers in this way is an effective tool for improving classification accuracy.Note, also,that the lowerleft cornerhas pink cloudswhichappearto be ice cloud. Oneof thedifficulties with histogramequalizationis the potentialfor exaggerating the intensityof thepixels. The analystmustusetemperature datato 2 0.0 93.1 3.8 0.0 3.1 0.0 0.0 0.0 had been misclassified as ice/snow were extracted from the NNc•ss•er 3 0.0 0.4 99.6 0.0 0.0 0.0 4 0.0 0.0 0.0 95.7 1.2 0.0 3.1 5 0.0 0.2 0.8 0.5 96.2 2.2 0.1 17 4.4 0.0 0.0 0.0 0.0 0.0 0.0 7.1 12.5 10.2 88.5 2.2 75.0 27 0.0 BERENDES ET AL.:COMPARISON OFCLASSIFIERS USINGAVHRR 6209 6210 BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR Table 7. ConfusionMatricesin the DesertRegion Class 1 2 3 1 97.4 0.0 0.1 2 0.0 97.3 1.6 3 0.0 0.0 4 0.4 5 14 17 27 3.5 0.0 6.3 0.0 0.1 0.7 0.0 0.0 0.0 4 27 lower for all classifierson this data set. A comparisonwith the regional classification results given above demonstratesthat significantimprovementis possibleby subdividingthe globe into regions and tuning the regional classifiers to local conditions. As expected,the sameproblemswith sunglintand 1.6 0.0 smoke/haze exist for this data set, but accuracies decrease in 0.0 0.0 almostall categories. Classificationof the pixels in the twilight data, which is not included in the global results given, is 10% less accuratewhen thosepixels are incorporatedinto the global 5 14 17 0.0 0.9 0.0 0.5 0.6 0.0 99.8 0.0 0.2 0.0 0.0 0.0 0.2 96.0 0.0 0.0 0.0 0.0 0.6 0.0 0.0 0.7 2.8 84.0 6.6 12.4 0.5 0.2 4.0 93.4 0.0 5.0 0.2 4.7 0.0 80.5 6.1 0.1 2.5 0.0 0.7 87.7 MLClass•er classifier. I 97.9 0.0 0.1 0.0 0.4 0.0 1.6 0.0 For the twilight data, the overall accuraciesare 93% (ML), 93% (CE), 93.1% (NN), and 92.1% (PH). Note that thesedata were taken from the polar regionsandhave a limited numberof potential classes,specifically,water, ice/snow, ice cloud, land, 2 0.0 97.0 0.3 1.4 1.3 0.0 0.0 0.0 and water 3 0.0 0.4 99.4 0.0 0.3 0.0 0.0 0.0 4 0.3 0.0 0.2 96.1 0.5 0.7 3.4 0.0 0.5 0.0 0.1 0.0 0.9 0.4 1.6 74.8 5.2 0.6 5 14 6.2 84.5 7.7 0.1 6.5 9.8 accurate for identifying cloud pixels, but both have approximately25% misclassificationof ice/snow. In contrast, the ML classifiercorrectlyidentifies94.8% of the ice/snowbut has difficulty distinguishingland from water (27% error). In spite of the misclassifications,the results are encouraging, demonstratingthat accuratecloud retrieval in sceneswith high solarzenith anglesis possible. Finally, the computationalcostof thesemethodscan be found in Table 4. It is clear that the neural network approachis the most efficient and should be used when its accuracy is comparableto one of the otherapproaches.For the polar, desert, biomass-burning and "other"data, the neuralnetworkprocessing time was 55% less than that requiredfor the ML classifierand was 57% less than for the PH method. For the global data, the PH •ass•er 17 8.3 0.0 0.0 0.0 15.5 0.0 74.0 2.2 27 1.8 0.0 0.0 2.1 2.7 8.6 0.0 84.8 I 96.8 0.0 0.1 0.0 0.5 0.0 2.4 0.2 2 0.0 98.5 0.0 0.6 0.9 0.0 0.0 0.0 3 4 0.0 0.4 0.0 0.1 99.5 0.2 0.0 94.8 0.5 3.8 0.0 0.2 0.0 0.3 0.2 CEclassg•er 0.0 5 3.3 0.7 0.0 0.7 83.4 4.1 5.1 2.7 14 0.0 0.0 0.0 0.0 6.3 93.6 0.0 0.0 17 6.3 0.0 0.0 0.0 13.0 0.0 80.0 0.7 27 0.0 0.0 0.0 0.7 0.5 5.0 6.1 87.7 I 2 97.2 0.0 0.0 96.1 0.2 1.2 0.3 1.2 0.4 0.0 0.0 1.1 2.3 0.0 0.0 0.0 3 0.0 0.9 99.1 0.0 0.0 0.0 0.0 0.0 4 0.4 0.0 0.2 98.8 0.4 0.0 0.0 0.1 5 14 3.4 0.0 0.3 0.0 0.0 0.0 1.9 0.5 67.0 0.0 7.9 99.4 5.8 0.0 13.7 0.0 17 27 4.1 0.0 0.0 0.0 0.0 0.0 0.0 6.1 16.0 0.9 0.0 9.1 74.9 0.0 5.0 82.9 NNc•ss•er all classifiers, with only 74-80% accuracies. Even the ML classifier,which performedwell on sunglintin SouthAmerica, has a 20% error rate. A visual inspectionof scenesfrom this area showsa greaterrangeof sunglintvaluesand a very gradual gradient between sunglint and water that overlaps with the spectralsignatureof water cloud. Becauseof the problemswith sunglint, note that most classifiersdescribedin the literature avoid sunglint regions. As with the previous data sets, haze/smokecontinuesto be difficult to distinguishfrom the underlyingsurface. On the basisof accuracyalone, neither the NN nor ML classifieris the clear choicefor this region. The overall accuraciesfor the "other" data are 94.6% (ML), 93% (CE), 96.4% (NN), and 91% (PH). Note that the polar, desert, and biomass-burningareas were specifically omitted from this data set. Sunglint continuesto be a sourceof error with only 72-79% accuracyand is primarily confusedwith water cloudand smoke/haze.The performanceon cloudclassification is comparablefor the NN and ML classifiers,makingit difficult to determine which would be the better choice for this region. The NN is more accurate for ice cloud and land, while the ML hasbetterperformancefor sunglintand smoke/haze.There were insufficient samplesin this data set to train for snow/ice and bright surface(desertlike) conditions. The results for the global data set are 89.8% (ML), 88.6% (CE), 92.4% (NN), and 89.1% (PH). Note that the accuracyis cloud. The NN and PH classifiers are the most decrease is much smaller, 31% for ML and 41% for PH, since the global nets are the largest and therefore the most computationally intensive. When comparedto the ML classifier, the CE method reduces the computation time from 7-44%. However, the accuracyof the CE methodis slightly lower than that of the ML classifierto which it converges.The decreasein accuracyof 0.8% in the desertregion and 1.2% in the global region provides a 44% and 38% reduction,respectively,in processingtime. A small increasein accuracyfor this method would make it a superior choice to that of the maximum likelihood method, particularly for an operational classifier. Note that the comparisonis betweenthe full featuredata setsfor the ML and CE classifiers. The six-channelML hascomparable accuracyand performanceto that of the full feature CE. However, it may not always be possibleto use such a small numberof lEatures. The advantageof the CE algorithmwill be mostapparentfor classification taskswhichuse a largenumber of features. 4.2. Choice of Features The classificationresultspresentedin section4.1 are based uponthe useof a largenumberof features,especiallyfor the PH method.However,thereis a considerable penaltyfor computing the various/Eatures and, in the case of the elimination methods, the canonicalhistograms.In the resultspresentedin section4.1, the ML and NN methodsused combinationsof 7, 9, 11, 10, 11, and 7 featuresfor the polar, SA, desert,"other", global and twilight classifiers,respectively.Conversely,the PH approach utilizes 82, 70, 94, 90, 118, and 42 features for these same regions.The questionaddressed in this sectionis the degreeto which the derived featurescontributeto higheraccuracyand at what cost. Table 4 shows the overall classification accuracies and computationtimes for 100,00 vectorsfor the ML, PH, and BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR 6211 NN classifiers,using both the full set of featuresdescribedin detail above and using only the original five AVHRR channels plus the computedreflectancevaluesof channel3. With the exception of the polar region, the ML method computationtime is reducedtbr all of the data sets. Reductions of 45% are shownfor the global and "other"classifiersand with little overall difference in classificationaccuracy. A detailed intercomparisonof the performancematrices reveals that, althoughthe overall accuraciesare similar, the accuraciesfor specific classes change. However, there were only three significantdifferences. The accuracyof the six-channelnetwork decreasedby 13% for the smoke/hazeclass in the desert and decreasedby 9% for the sunglint class in the "other" data set. Interestingly,the accuracyof the 6 channelnetworkincreasedby The ML accuracyfor the variouspolar, SA, desert, "other," global,and twilight classifiersdecreasedby 0.5- 1.5% when the ancillarydata were not used. There is very little advantagein using the ancillary data as additional inputs to the neural network. The highly nonlinearstructureof the NN approachcan adjust the weights without having to utilize this additional separation between two classes that was only visible by displayingall three bandsas the featurechannel 1 - channel2. The decrease in accuracy suggeststhat features essential to making class distinctionsin the twilight and "other" data sets were not being createdby the network but were includedin the derived feature set. More researchis necessaryto determine if a changein network topologywould impact the complexityof classificationapproachby effectivelyrequiringa much larger and more complexdecisionsurfaceto be generated,and it is more computationallyexpensive. The impact of training without information. This is true for both the full feature and six-channel classifiers. The situation is much different for the PH method. For the full set of features, use of the ancillary information leads to an increasein overall accuracyof 0.5- 3% for the polar, SA, desert, "other," global, and twilight classifiers. The differences are even larger for the six-channel approach,with increasesin accuracyof 0.5% to 4% and a 9% increasefor the desertregion. 12% for the smoke/haze class in South America. For all other The use of ancillarydata may also decreasethe computational cases,the six-channelapproachproducesequivalent accuracy times. Typical savings are 10-15% using the ML approach, because the classifier can eliminate certain classes from with a large reductionin computationalexpense. Table 4 showsthat the NN approachusing the six-channel consideration.The savingsare muchlarger for the PH approach, data achievedcomparableresultsfor all the data sets,with some with computationaltimes decreasingby as muchas 50%. The slight increases and some slight decreasesin accuracy for efficiencyof the neuralnetworkwasunchanged. specificclasses, and requiredonly 39-72% of the computation time required for the compoundfeature data. These lower 4.4. Use of Subclasses computationaltimes are causedby a decreasein the number of All of the resultspresentedabovehaveusedsubclasses.That input nodes, and a correspondingreductionin the number of means that the classifiersactuallyconsidereda muchlarger set weight updates required, and by removing the feature value of classes than are reported These subclasseswere then computations.For the twilight data, accuracydroppedby 1.0%, processwas completed.For for the "other" data it fell 1.4%, and for the global data the "collapsed"after the classification decreasewas 1.2%. Although these decreasesare small, the example, the water cloud classconsistsof sevensubclasses.All differenceswere investigatedusinga visualizationprogram. It is of the water cloud subclass results are collapsed after classification,and the final resultsare seenin the performance clear that the network "builds" features similar to those Obviously, this procedure complicates the discussedin section 2.3. For example, the network found a matrices. subclass distinctions was tested on the ML, PH, and NN classifiers. As expected,fewer classesresultedin reducedcomputation time for all threeclassifiers.The reductionswere40-50% (ML), 10-35% (NN), and 50-65% (PH). Less expectedwas the the combinations of features the network was able to approximateor if the missing featurescan be identified and resultingreductionin classificationaccuracyexhibitedby every classifier. The reductionsin classificationaccuracyof 1-3%, included in the feature vectors. demonstrate that many classes are not well clustered in Table 4 also showsthat while the computationaltimes are approximately equal for the full feature and six-channel multidimensionalspace. The subclassesform more compact classifiers using the PH method, there is a penalty in clusters which may be easier to separate. The significantly classification accuracy by 2-7%. Methods that are highly higher computationalexpenseis justified for this study since a nonlinear, such as ML and especially NN, suffer little from lossof accuracyof 2% can be significantfor the purposesof the usingthe six-channelfeatures. They are capableof warpingthe EOS CERES team. Note that similar experimentsusingthe sixdecisionsurthcessufficientlyto accommodatethe clustereddata channeldataproducedreductionsin accuracyof up to 8%. in each class. However, the PH method does not have this luxury. It simulatesnonlinearitythroughthe use of complex 5. Conclusions features, such as ratios and difference ratios. Thus the more nonlinear the classifier, the less it is useful to utilize complex features. 4.3. Utilizing Ancillary Data Sets This paper comparesthe accuracyand efficiencyof four classification schemes for identifyingpixelsin AVHRR imagery. The four classifiersrepresentthree distinctapproaches to this problem. The maximum likelihood classifier is a traditional All of the classifierswere modifiedto utilized ancillarydata statistical technique, the pairedhistogram methodis basedupon sets. In particular, a land-water mask was used to eliminate the algorithmsfor identifying surfaceswhich separatepairs of water classover land surfaces,spatial resolutionterrain maps classes,and the neuralnetworkreliesuponthe network'sability were used to identify mountainousregions, and the sunglint to learn nonlinear separatingsurfaceswhich isolate each class probabilitymask identified regionsof potentialsunglint. The from all others. The fourth classifier, the class elimination sunglintclass is not allowed outsideof these regions. The technique,is a variationof the maximumlikelihoodtechnique questionarises as to the cost-benefitof utilizing the ancillary which uses a canonical transform to reduce the number of information in the various classifiers. potential classesand, concomitantly,to increaseclassification 6212 BERENDES ET AL.: COMPARISON OF CLASSIFIERS USING AVHRR speed. The class elimination techniqueis identical with the which can, at best, seriouslyimpact classificationspeed,and at worst, introducenoise into the classifier. A large number of Althoughthe fundamentalgoal of this researchis to distinguish featuresare generatedfor the training data, and thosewith the cloud from noncloud pixels in AVHRR imagery, all of the greatestpotential for identifying particularclassesare retained. techniqueswere sufficientlyrobustto allow for the identification For the paired histogram and class elimination techniques, of many more classesand subclasses.Eight classes,which can features are selected which exhibit the lowest correlation and the be split into 20 subclasses, were selectedfor this study. highest divergence. A subset of these features, reduced by The regional comparisonswere conductedon AVHRR LAC tighter divergenceand correlationrequirements,is usedby the This scenesfrom the polar regions, desert areas, and regions of neural network and maximum likelihood classifiers. biomass burning; areas which are known to be particularly techniqueis comparedto the simpler approachof usingthe six difficult. Three additional data sets were created to test the channelsof data for each pixel as the elementsof the feature generalizationabilities of the classifiers. The "global"data set vector. The paired histogram method suffered declines in containssamples from all areas of the Earth, the "other" set accuracyfrom 1-9% usingthe six-channeldata. The maximum containsglobal data minus vectorswhich would fall within the likelihood classifier was less sensitiveto the changein features purview of the regional classifiers, and the "twilight" set selected, with small decreasesin accuracyfor some of the containsvectorsfrom the polar regionswhere the solar zenith experiments. The neural network was the least affected,with some small decreases and some small increases. Thus the angleis between80ø and 85ø. Accuracy is reported for sets of testing data which are feature selectionmethodcannotbe chosenindependentlyof the independentof the trainingdata, that is, the testingdata are from classifier to be used. In all cases, classificationspeed was different swaths than the training data and had not been increasedby using the six-channeldata. The most significant previouslyused by the classifiersin any way. For the polar, improvements occurred for the neural network, which desert, and biomass-burningregions,the maximum likelihood experiencedas much as a 72% reductionin classificationtime. classifier achieved 94-97% accuracy, the neural network No one classifier solved all problems. The maximum achieved94-96% accuracy,and the paired histogramapproach likelihood and neural network approacheshave comparable achieved93-94% accuracy. The primary advantageto the class accuracies. The maximum likelihood is slightly more accurate elimination schemelies in its speed. Its accuracyof 94-96% is for the biomass-burningarea, but the neural network has the an averageof 1% lower than that of the maximum likelihood superiorperformancefor the global data. However, the neural method,but speedupsof 7-44% for theseregionsare worthyof network is the least computationally expensive approach. note. Althoughthe overall accuraciesare very similar, there are Consideringboth efficiency and accuracy,the neural network differences for specific classes, particularly for sunglint and using regional classifiersand the six-channeldata as input is smoke/haze. The maximum likelihood classifierhas the highest presentlythe best choice for this task. Although the paired accuracyfor these difficult classes. For other classes, the histogramclassifierappearsto be the least attractivebecauseit accuracyof the neural network is similar and in some casesis was neither the most accurate nor the most efficient, that superior to that of the maximum likelihood classifier. The approachmay ultimately prove the most useful. The paired neural network also has the smallest classification times. histogramclassifieris clearly the most flexible, can mosteasily Although the paired histogram method is rarely the most incorporateancillarydata as they becomeavailable,and is well accurate,its performanceis similar to that of the other two suited for handling multimodal class distributions. Similarly, classifiers,and it is clearly the mostflexible method. For all of the current accuracy of the class elimination technique is these techniques,additional efforts need to be focusedupon insufficientfor this project. However, the classificationspeed regionsof strongsunglint, cumulusclouds,silty water, slush, would make this a viable option if the accuracycould be and thin smoke/haze. increased by 1-2%. Further study of both techniques is Experiments also clearly demonstratethe effectivenessof warranted,particularlyas additionalancillarydatabases become decomposinga single global classifierinto separateregional available. classifiers,sincethe regionalclassifierscanbe morefinely tuned to recognizelocal conditions. Using the full featureset, all of Acknowledgments. This work was supportedby National Aeronautics the classifiersproducedaccuraciesbetween 89 and 92% when andSpaceAdministrationContractNAS 1-19077 whichis partof theEarth trainedand testedon globaldata, a significantdecreasefrom the ObservingSystem(EOS) Cloudsand the Earth's RadiantEnergySystem resultsreportedabove. Interestingly,the resultswere slightly (CERES) program. Supportwas alsoprovidedby NAGW 3740, whichis better, 94% for the maximum likelihood, 96% for the neural managedby RobertJ. Curran network, and 91% for the paired histogramtechnique,when usingthe "other"data. This demonstrates thatthe complexityof References the classificationtask decreaseswhen the three difficult regions Allen, R. C., P. A. Durkee, and C. H. Wash, Snow/cloud discriminationwith maximum likelihood method when no classes can be eliminated. listed above are removed. multispectralsatellitemeasurements, J. Appl. Meteorol., 29, 994-1004, 1990. Experimentsusing the "twilight" data show the varying impactof a large solarzenith angle. The majorityof the errors Arking, A., Latitudinal distribution of cloud cover from TIROS II photographs, Science,143, 569-572, 1964. in this data set occurredin distinguishingsnow/ice from ice Baum, B. A., T. Uttal, M. Poellet, T. P. Ackerman, J. M. Alvafez, J. Intrieri, cloud and separatingland from water. However, the overall D. O'C. Starr, J. Titlow, V. Tovinkere, and E. Clothiaux, Satellite remote classificationaccuraciesof 92-93% usingthe full featureset are sensingof multiplecloudlayers,J. Atmos.Sci., 52, 4210-4230, 1995. encouraging, demonstrating that accurate cloud retrieval in Brown,J. W., O. B. Brown,and R. H. Evans,Calibrationof advancedvery high resolutionradiometer infrared channels:A new approachto sceneswith high solarzenithanglesis possible. nonlinearcorrection,J. Geophys.Res., 98, 18257-18268, 1993. Feature selectionis a critical step in classifier design. A Cihlar, J., and J. Howarth, Detection and removal of cloud contamination delicate balance is required between providing enough from AVHRR images,IEEE Trans. Geosci.RemoteSens.,32, 583-589, 1994. informationto performthe classificationandprovidingtoo much BERENDES ET AL.: COMPARISON OF CLASSIFIERS Coakley,J. A., Jr., and D. G. Baldwin, Towardsthe objectiveanalysisof cloudsfrom satelliteimagerydata,J. Clim. Appl. Meteorol., 23, 1065- USING AVHRR 6213 spacecraft: assessmentand recommendationsfor corrections,NOAA Tech.Rep.NESDIS,70, 21 pp., 1993. Reynolds,D. W., and T. H. VonderHaar, A bispectralmethodfor cloud Coakley, J. A., Jr., and F. P. Bretherton,Cloud cover from high-resolution parameterdetermination,Mon. WeatherRev., 105, 446-457, 1977. scannerdata: Detectingandallowingfor partiallyfilled fieldsof view, J. Richards,J. A., RemoteSensingDigital Image Analysis:An Introduction, Geophys.Res., 87, 4917-4932, 1982. 2ndEd., 340 pp., Springer-Verlag, New York, 1993. Cox, C., and W. Munk, Measurementsof the roughnessof the sea surface Rossow,W. B., Measuringcloud propertiesfrom space: A review, J. Climate, 2, 201-213, 1989. from photographs of the Sun'sglitter,J. Opt. Soc.Am., 44, 838-850, 1954. Rossow, W. B., and L. C. Garder, Cloud detection using satellite measurementsof infrared and visible radiancesfor ISCCP. J. Clim., 6, d'Entermont, R. P., Low- and midlevel cloud analysisusing nighttime 2341-2369, 1993. multispectral imagery,J. Clim.Appl.Meteor.,25, 1853-1869,1986. Duda, R. O., and P. E. Hart, Pattern Classificationand SceneAnalysis, Rossow, W. B., L. C. Garder, and A. A. Lacis, Global, seasonal cloud variationsfromsatelliteradiancemeasurements, I, Sensitivityof analysis, 482 pp.,JohnWiley, New York, 1973. J. Clim., 2,419-458, 1989a. Ebert, E., A pattern recognitiontechniquefor distinguishingsurfaceand cloudtypesin the polar regions,J. Clim. Appl. Meteorol.,26, 1412- Rossow, W. B., C. L. Brest, and L. C. Garder, Global, seasonal surface 1427, 1987. variationsfrom satellite radiance measurements,J. Clim., 2, 214-247, 1989b. Ebert, E., Analysis of polar clouds from satellite imagery using pattern recognitionand a statisticalcloudanalysisscheme,J. Appl. Meteorol., Rumelhart, D., G. Hinton, and R. Williams, Learning internal 28, 382-399, 1989. representations through error propagation,in Parallel Distributed Eck, T. F., and V. L. Kalb, Cloud-screeningfor Africa using a Processing:Explorationin the Microstructureof Cognition,editedby geographically and seasonally variablethreshold, Int. J. RemoteSens., D. Rumelhartand J. McClelland,pp. 318-362, MIT Press,Cambridge, 1099, 1984. 12, 1205-1221, 1991. Fleet Numerical OceanographyCenter, FNOC/NCAR global elevation, terrain, and surfacecharacteristics, Digital RasterData on a 10-minute Geographic(lat/lon) 1080x2160 grid, Global Ecosystems Database, version1.0, discA, Natl GeophysData Cent.,Boulder,Colo., 1992. Foley, J. D., A. Van Dam, S. K. Feiner, and J. F. Hughes, Computer GraphicsPrinciplesand Practice,1175 pp.,Addison-Wesley, Reading, Mass., 1986. Seze, G., and W. B. Rossow, Remote Sens., 12, 921-952, 1991. Stowe,L. L., E. P. McClain, R. Carey, P. Pellegrino,G. Gutman,P. Davis, C. Long, and S. Hart, Global distributionof cloud cover derivedfrom NOAA/AVHRR operationalsatellitedata,Adv. SpaceRes., 11(1), 51- Mass., 1990. Gutman,G., D. Tarpley,andG. Ohrin,Cloudscreening for determination of land surface characteristicsin a reduced resolution satellite data set, Int. J. Remote Sens., 8, 859-870, 1987. Effects of satellite data resolution on measuringthe space/timevariations of surfacesand clouds, Int. J. 54, 1991. Tovinkere, V. R., M. Penaloza,A. Logar, J. Lee, R. C. Weger, T. A. Berendes,and R. M. Welch, An intercomparison of artificialintelligence approaches for polar sceneidentification,J. Geophys.Res., 98, 5001- 5016, 1993. Inoue, T., A cloud type classificationwith NOAA-7 split-window measurements, J. Clim.Appl. Meteor., 24, 669-686, 1987. Weinreb, M.P., G. Hamilton, S. Brown, and R. J. Koczor,Nonlinearity Key, J., Cloud cover analysiswith Arctic AVHRR data, 2, Classification correctionsin calibrationof advancedvery high resolutionradiometer with spectraland texturalmeasures,J. Geophys.Res., 95, 7661-7675, infraredchannels,J. Geophys.Res.,95, 7381-7388, 1990. 1990. Welch, R. M., K. S. Kuo, and S. K. Sengupta,Cloud and surfacetextural Key, J., andR. G. Barry, Cloudcoveranalysiswith Arctic AVHRR data, 1: featuresin polarregions,IEEE Trans. Geosci.RemoteSens.,28, 520Cloud detection,J. Geophys.Res.,94, 18521-18535, 1989. 528, 1990. Kidwell, K., NOAA Polar Orbiter Data UsersGuide, report,Natl. Oceanic Welch, R. M., S. K. Sengupta,A. K. Goroch,P. Rabindra,N. Rangaraj,and andAtmos.Admin., Washington,D.C., 1995. M. S. Navar, Polar cloud and surface classificationusing AVHRR Minnis, P., and E. F. Harrison, Diurnal variability of regional cloud and imagery: An intercomparison of methods,J. Appl. Meteorol.,31,405420, 1992. clear-skyradiativeparametersderivedfrom GOES data, I, Analysis method,J. Clim.Appl. Meteorol.,23, 993-1011, 1984. Paola, J. D., and R. A. Schowengerdt,A review and analysis of backpropagation neural networksfor classification of remotely-sensed B. A. Baum,AtmosphericSciencesDivision,NASA LangleyResearch Center,Hampton,VA 23681. multi-spectral imagery,Int. J. RemoteSens.,16, 3033-3058,1995. T. A. Berendes,K. S. Kuo, and R. M. Welch, Departmentof Prabhakara,C., R. S. Fraser, G. Dalu, M. L. C. Wu, and R. J. Curran, Thin cirrus clouds: Seasonal distribution over oceans deduced from Nimbus-4 IRIS, J. Appl. Meteorol.,27, 379-399, 1988. Rabindra,P., S. K. Sengupta, andR. M. Welch,An interactive hybridexpert systemfor polar cloudand surfaceclassification, Environmetrics,3(2), 121-147, 1992. Rao, C. R. N., and J. Chen, Post-launchcalibrationof the visible and near infrared channelsof the advancedvery high resolutionradiometeron NOAA-7, -9, and-11 spacecraft, NOAATech.Rep.NESDIS,78, 22 pp., 1994. Atmospheric Sciences, GlobalHydrologyandClimateCenter,Universityof Alabama in Huntsville, Huntsville, AL 35806. E. M. Corwin,A. M. Logar,Department of Mathematics andComputer Science,SouthDakotaSchoolof MinesandTechnology,RapidCity, SD 57701. A. Pretre, Martin and Associates,Inc., 1515 N. SandbornBlvd., Mitchell, SD 57301. R. C. Weger,Instituteof AtmosphericSciences,SouthDakotaSchoolof Mines andTechnology,RapidCity, SD 57701. Rao, C. R. N., J. Chen,F. W. Staylor,P. Abel, Y. J. Kaufman,E. Vermote, W. R. Rossow,and C. Brest,Degradationof the visibleand near-infrared (ReceivedMay 6, 1998;revisedJuly 29, 1998; channelsof the advancedvery highresolutionradiometeron the NOAA-9 accepted July 30, 1998)