Data system design for a hyperspectral imaging mission concept The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Hartzell, C.M. et al. “Data system design for a hyperspectral imaging mission concept.” Aerospace conference, 2009 IEEE. 2009. 1-21. © 2009 Institute of Electrical and Electronics Engineers. As Published http://dx.doi.org/10.1109/AERO.2009.4839507 Publisher Institute of Electrical and Electronics Engineers Version Final published version Accessed Wed May 25 18:22:40 EDT 2016 Citable Link http://hdl.handle.net/1721.1/59373 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. Detailed Terms Data System Design for a Hyperspectral Imaging Mission Concept . Christine M. Hartzell Aerospace Engineering Sciences University of Colorado Boulder, CO 80309 (206) 915-6409 hartzelc@colorado.edu Jennifer Carpena-Nunez Department of Physics University of Puerto Rico San Juan, PR 00931 (787) 381-4171 jennifercarpena@gmail.com Lindley C. Graham Department of Aeronautics and Astronautics Massachusetts Institute of Technology Cambridge, MA 02139 (408) 306-9247 lcgraham@mit.edu David M. Racek Electrical and Computer Engineering Department Montana State University Bozeman, MT 59717 (253) 205-4538 dave.racek@gmail.com Tony S. Tao Aerospace Engineering Pennsylvania State University University Park, PA 16802 (610) 306-2761 tonystao@gmail.com Christianna E. Taylor School of Aerospace Engineering Georgia Institute of Technology Atlanta, GA 30332 (617) 669-6774 christianna.taylor@gmail.com Hannah R. Goldberg Jet Propulsion Lab California Institute of Technology Pasadena, CA 91109 (818) 216-6856 hannah.r.goldberg@jpl.nasa.gov Charles D. Norton Jet Propulsion Lab California Institute of Technology Pasadena, CA 91109 (818) 793-3775 charles.d.norton@jpl.nasa.gov complexity of Earth science increases. The designs presented here are the work of the authors and may differ from the current mission baseline. Abstract—Global ecosystem observations are important for Earth-system studies. The National Research Council’s report entitled Earth Science and Applications from Space is currently guiding NASA’s Earth science missions. It calls for a global land and coastal area mapping mission. The mission, scheduled to launch in the 2013-2016 timeframe, includes a hyperspectral imager and a multi-spectral thermal-infrared sensor. These instruments will enable scientists to characterize global species composition and monitor the response of ecosystems to disturbance events such as drought, flooding, and volcanic events. Due to the nature and resolution of the sensors, these two instruments produce approximately 645 GB of raw data each day, thus pushing the limits of conventional data handling and telecommunications capabilities. The implications of and solutions to the challenge of high downlink data volume were examined. Low risk and high science return were key design values. The advantages of onboard processing and advanced telecommunications methods were evaluated. This paper will present an end-to-end data handling system design that will handle the large data downlink volumes that are becoming increasingly prevalent as the TABLE OF C ONTENTS 1 2 3 4 5 6 7 8 M ISSION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C LOUD D ETECTION T ECHNIQUES . . . . . . . . . . . . . . . C OMPRESSION S CHEMES . . . . . . . . . . . . . . . . . . . . . . . . . C OMMUNICATIONS A RCHITECTURE . . . . . . . . . . . . . G ROUND S TATION S ELECTION . . . . . . . . . . . . . . . . . . . S CIENCE A NALYSIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DATA S TORAGE AND D ISTRIBUTION . . . . . . . . . . . . . C ONCLUSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . R EFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 4 5 8 12 14 16 18 19 19 1. M ISSION OVERVIEW The National Research Council’s (NRC) report entitled Earth Science and Applications from Space provides a roadmap of Earth-science missions for the next decade. The HyspIRI c 978-1-4244-2622-5/09/$25.00 2009 IEEE. IEEEAC Paper #1375, Version 4 Updated 1/09/2009. 1 mission, expected to launch between 2013 and 2016, is called for by the NRC to monitor global ecosystem health in order to identify the ecological effects of climate change and urbanization. The nominal mission duration of 3 years will allow analysis of the temporal variation in the ecological properties as well. The science objectives are achieved using two imaging spectrometers that operate in different regions of the electromagnetic spectrum. Due to the high resolutions of the imagers and the global mapping nature of the mission, approximately 6455 gigabytes of raw data will be collected each day. The HyspIRI payload is sized to be approximately 145 kg. A commercial bus, the SA-200HP bus from General Dynamics, will be modified for use on the HyspIRI mission. Notable characteristics of this bus include upgrade options for GPS and the ability to have 3-axis pointing with 16 arcsec pointing control, 0.5 arcsec knowledge, and 0.1 arcsec/sec stability. The bus should provide 2,000 W at end-of-life (EOL) (at 1 AU) and has enough onboard fuel capacity for a 6-year mission. The launch vehicle is assumed to be a Taurus-class rocket [1]. Based on the last TeamX study conducted in July 2007, the HyspIRI satellite will orbit at an altitude of 623 km in a circular, sun-synchronous orbit, which translates into an inclination of 97.0 degrees [1]. The local descending node of the orbit would be 11am [1]. According to the mission scientists, with this local descending node time, there will be the fewest clouds (statistically) obstructing the sensors’ view of the Earth [2]. Science Objectives HyspIRI aims to detect ecosystem responses to climate change and human land management. It intends to monitor and enable prediction of volcanic activity, landslide hazards, wildfire susceptibility, drought, exploitation of natural resources, species invasion, surface and soil alterations, changes in surface temperature, deforestation and ecosystem disturbance. These can be detected by distinctive spectral features in the shortwave, visible, near infrared and infrared regions of the electromagnetic spectrum. Missions Operations A selective instrument resolution scheme was created to reduce the mission’s data volume while retaining image detail in more scientifically-significant areas. By working with the mission scientists, Robert Green and Simon Hook at JPL, geographic areas where the data resolution could be reduced were identified. Imager Characteristics The HyspIRI mission carries two imaging spectrometers: the Visible Shortwave Infra Red (VSWIR) hyperspectral imaging spectrometer and a Thermal Infra Red (TIR) multispectral imager. The two instruments will address independent science objectives. Since the TIR measures the thermal energy radiated by the earth, it will operate during the entire orbit, regardless of the position of the sun. If the sensor is taking measurements of the ocean, the resolution will be reduced by averaging the data over multiple pixels. The spatial resolution over the ice sheet in Antarctica is also reduced to the ‘Ocean’ resolution level. According to Hook, 200 m x 200 m would be a desirable resolution for the TIR over the ocean [3]. Leastsignificant spectral quantization data will be dropped to reduce the quantization resolution to two bits per band since exact spectral data is not as important to scientists as an overall classification and trends in the regions of lower resolution. Cloud data is of interest to scientists in this spectral range, so clouds will not undergo data reduction [3]. The TIR’s resolution modes are enumerated in Table 1 below. Using this scheme, 97% of TIR data volume would be devoted to Land/Shallow Water data. The VSWIR imager will capture sunlight reflected off the Earth’s surface, while the TIR collects thermal energy emitted by the Earth’s surface. The VSWIR instrument will have a 145 km wide swath width. It is a nadir-pointing push-broom instrument, with a spatial resolution of 60m and a repeat cycle of 19 days. The TIR instrument will have a 600 km wide swath width. It is also nadir-pointing, however it has a shorter repeat cycle (5 days) due to its large swath width. The TIR instrument is a whisk-broom sensor with a spatial resolution of 60m. The VSWIR instrument will collect reflected light at 214 spectral bands, from 380nm (violet) to 2510nm (near infrared) with a 10nm spectral resolution. The TIR collects data at 8 bands, 7 bands between 7.3-12.1 µm and 1 band between 3-5 µm. Due to the locations of their spectral bands in the electromagnetic spectrum, the TIR will address science objectives concerning thermal events while the VSWIR will address science objectives related to surface composition and ecosystem information. The VSWIR instrument’s resolution modes differ from those of the TIR instrument. Since the VSWIR instrument requires the reflected light from the sun to operate, the sensor will only be active when the sun-elevation angle is sufficient to produce meaningful data [4]. Current sensor design puts this elevation at 20◦ . Resolution modes for the VSWIR are shown below in Table 2. Clouds imaged by the VSWIR instrument are compressed spectrally since images of clouds in the VSWIR wavelengths provides little scientifically usefully information. Cloud pixels are not compressed spatially since the spatial unpredictability of clouds would make consistent Mission Design Characteristics The preliminary mission design presented here was developed by JPL’s TeamX. TeamX creates integrated preliminary mission design concepts using a collaborative engineering environment. 2 Table 1. Spatial, spectral, and quantization resolutions for the TIR instrument by land classification. Table 2. Spatial, spectral, and quantization resolutions for the VSWIR instrument by land classification. spatial compression impossible. Using this scheme, 98% of VSWIR data volume would be devoted to Land/Shallow Water data. Note that, in contrast to the TIR, the VSWIR keeps its full spectral and quantization resolution over ‘Ocean’ – designated areas. to store hundreds of gigabytes of information until the next ground station is in range. The communications system will have to handle transmission of five terabits of data per day, yet still have a robust enough downlink margin to allow for a missed communications pass. Once the data is on the ground, a series of ground calibrations and processing must be done and the challenge again becomes the huge data volume to be stored. The problem of making the science products readily available and accessible is not easily answered when sifting through hundreds of terabytes of data. Data Volume Challenge Hyperspectral data is not new to earth science remote sensing missions, rather what makes the HyspIRI mission unique is the amount of data being collected, stored, and distributed. The huge data rate, on the order of hundreds of gigabytes per day, is due to the two instruments collecting high resolution data over a wide spectral range. The spectral information, in the case of the VSWIR imager, is quantized into 214 bands each represented by a 14 bit sample. The spectral information collected by the TIR imager is similarly quantized into 8 bands, with each sample being described in 14 bits. With a maximum sample rate of approximately 70Msamples/second, the resulting data stream is substantial. The HyspIRI mission will collect nearly two petabytes over the three year mission life. Figure 1 shows the overall data reduction transferred over the communication system and the data stored on the ground over the three year mission. The data reduction that is accounted for in these graphs includes reductions due to decreasing the instrument resolution when taking data over areas of little scientific interest and onboard compression of data. Shown on the left is the amount of data reduced on-board the spacecraft per orbit, thereby reducing the volume of data that must be transmitted to the ground. On the right is the total data reduction over the three year mission. It can be seen that there is a substantial reduction in data volume due to the compression techniques employed and discussed in detail in this paper. The HyspIRI mission presents a data handling challenge that will push the limits of current processing, storage, and communication technologies. Moore’s law, stating that the number of transistors that can be inexpensively placed on an integrated circuit will double approximately every two years [5], may ease the cost and technology boundaries by the mission’s 2011 technology cutoff. Nevertheless, the downlink capabilities and on-board processing power of the HyspIRI mission limit the amount of data that can be collected and analyzed. The mission utilizes hardware implementation of data processing and compression methods to reduce the stream of data coming from the instruments. The on-orbit data system must be able to process and compress up to 70M samples per second in real time. Different resolution modes, cloud screening, and lossless data compression become vital to the overall data reduction. Without these data reduction methods, the data volume would be well beyond the feasible volume to transmit to earth, meaning a smart combination of these options must be designed and modeled. The spacecraft is physically limited by the on-board solid state storage which must be able Figure 1. Data Reduction Graphs showing the effectiveness of the compression schemes employed by the HyspIRI mission. Given the high data downlink volume, a two-pronged ap3 proach was taken in designing the mission’s data system. First, techniques were employed to reduce the data downlink volume. Once those techniques were exhausted, a data system architecture was designed to accommodate the resulting data downlink volume. Five topics associated with data downlink volume reduction and accommodation were investigated in further detail and presented here. In attempts to reduce the data downlink volume, cloud detection and hyperspectral data compression algorithms were identified. In order to accommodate the resulting data downlink volume, communications architectures were analyzed and ground stations were identified. Additionally, preliminary science products were identified and a ground-based data storage architecture was developed. scattering effects. Further analysis, possibly in the form of an Observing Simulated Systems Experiment (OSSE), could investigate the effect of this omission on the quality of the science returns. 2. C LOUD D ETECTION T ECHNIQUES An on-board cloud detection algorithm is required for the HyspIRI mission in order to reduce the data downlink volume. While the cloudy TIR data has science value, there is little benefit to returning cloudy data in the spectral range of the VSWIR instrument. Since clouds cover nearly 30% of the globe, the total mission data downlink volume can be reduced by nearly 96 GB (13%) per day by applying additional compression to the cloudy pixels. In order to detect clouds, the raw data must first be converted to the top-of-atmosphere (TOA) reflectance. There are several detection algorithms that can be compared and evaluated based on the needs of the mission. Finally, once clouds have been detected, the cloudy pixels will undergo lossy spectral and bitwise compression, before being further compressed by the main compression algorithm. Note that clouds will only be detected when the instrument is over land or coastal areas, due to the selective resolution of the instrument. A preliminary cloud detection algorithm was applied to sample hyperspectral data collected by an air-borne imaging spectrometer (JPL’s Airborne Visible/Infrared Imaging Spectrometer (AVIRIS)) as a proof-ofconcept trial. Figure 2. Calculation of Reflectance for the Equivalent Solar Zenith Angle (θz ). Reflectance is the sensed radiance divided by the normal component of the solar radiance for the zenith angle. Types of Cloud Detection Algorithms The cloud detection algorithm selection was driven by two main requirements: the algorithm must be able to process data in real-time on-board the spacecraft and it must distinguish between cloud and non-cloud pixels with sufficient accuracy. Specifically, since the cloudy pixels will be lossily compressed, the cloud detection algorithm must be clear-sky conservative; it is more important to avoid erroneously classifying pixels as cloudy than to detect all the cloudy pixels. Preliminary On-board Processing On first glance, the problem of cloud detection seems simple, given the characteristic shapes and brightness of clouds. However, automated cloud detection poses a challenge since classifiers must be able to discern on a pixel-by-pixel basis cloud from non-cloud. With respect to the high reflectance of clouds, in hyperspectral imagers it is a challenge to determine at which wavelength to look for a characteristic cloud signature since smoke, ash plumes, snow and highly-reflective sand can be easily mistaken for clouds. Often cloud detection algorithms are applied once the data has been sent to the ground, where there are fewer constraints on processing time and memory requirements. In order to apply a cloud detection algorithm, the raw VSWIR data must be converted to TOA reflectance. First, the dark level signal will be subtracted from the raw data. Then the data is converted to radiance. Finally, the radiance is converted to TOA reflectance for the equivalent solar zenith angle by dividing the sensed radiance by the normal component of the solar radiance for the equivalent zenith angle (see Figure 2). Similar on-board calibration was done during the Hyperion instrument’s extended mission on the Earth Observing-1 satellite [6]. The equivalent solar zenith angle is calculated on-board using the spacecraft’s ephemeris data. Although, it is theoretically necessary to recalculate the zenith angle for each pixel, it is predicted that impact of recalculating this value less frequently will be minimal. Further analysis should address the implementation of this calibration process onboard the spacecraft. Also, it is important to note that this calibration protocol does not take into account the atmospheric Cloud detection is a subset of current research surrounding automated pixel classification. There are several types of algorithms that can be used for pixel classification. A comparison of four types of pixel classification, used for the purpose of autonomous event identification, is given in [7]. A more detailed look at cloud detection algorithms is given in [8]. 4 On-board cloud detection was previously implemented on the Hyperion instrument of Earth Observing-1 (EO-1). An algorithm containing a series of thresholds was initially implemented on the instrument (see [9] for details) for cloud detection. During Hyperion’s extended mission, a Support Vector Machine algorithm was used for pixel identification [6]. Due to the heritage of these two algorithm types on-board Hyperion, they were selected for further investigation with respect to implementation on the HyspIRI mission. detection algorithms that could be used by HyspIRI. A simple threshold algorithm was applied to the sample AVIRIS scenes: if ρ1.67µm ≥ 0.5, then the pixel is cloudy, where ρx is the reflectance at wavelength x. The threshold was applied at a wavelength (1.67µm) where clouds have high reflectance values and there is a low sensitivity to the presence of water. The algorithm was first applied to an AVIRIS image from South-Central Pennsylvania (39.96◦ N, -78.32◦ E) taken July 2, 2008. This image is an obviously cloudy, agricultural scene. The accuracy of the cloud mapping algorithm can be seen in Figure 3. It can be seen that most of the clouds are detected by the algorithm. Figure 3b, a subset of Figure 3a, shows that the algorithm does not very accurately detect smaller, less bright clouds. Unfortunately, no quantitative analysis can be done on the results from this classification because there is currently no “truth” data - i.e., the image has not been manually labeled for clouds. However, the preliminary qualitative results from this image indicate that the classifier is performing as desired, that is to say, it is clear-sky conservative. Threshold cloud detection algorithms operate like a series of logic gates. Pixels are labeled as clouds if their reflectances are within a given range for a given wavelength or set of wavelengths. At its simplest, a threshold algorithm can take the form: if ρ1.67 ≥ 0.4, then the pixel is cloudy. A series of six thresholds was initially implemented on Hyperion for cloud detection [9]. Multiple thresholds were used in order to reduce the classification error over desert, snow, ice and highly reflective water. The advantage of threshold algorithms is that their simplicity results in fast processing times. However, threshold algorithms become impractical if attempting to label more than one class of pixels. Support Vector Machines (SVMs) are more capable classifiers than threshold algorithms, although this increased capability comes with a cost of increased complexity. SVMs can be used to label multiple classes of pixels, for example snow, water, ice, land and cloud. An SVM is a machine learning algorithm that maps data into a higher dimensional space and then classifies it based on individual data points’ positions relative to separating hyperplanes [10]. The SVM must be trained using sample data to distinguish between the different classes of pixels. This can be accomplished through user-defined pixel labeling or clustering methods, which classify the pixels based on the desired number of classifications. SVMs also vary in complexity based on the way they separate the data in multidimensional space. A more detailed explanation of the theory behind SVMs can be found in [11]. On EO-1, an SVM using a linear kernel was implemented to autonomously detect science events of interest, such as volcanic eruptions and ice sheet breakup. Similar capabilities could be implemented on HyspIRI, although they are not specifically required by the current science objectives. The algorithm was also applied to a desert scene from Central Nevada (38.52◦ N, -117◦ E) taken on June 11, 2008. As mentioned, sand, which can be highly reflective, is often difficult to differentiate from cloudy pixels. For this scene, which contained no clouds, the classifier erroneously labeled 0.11% of the pixels as cloudy. The cloud detection must be applied to more data samples in order to determine the average classification error. Additionally, the minimum classification accuracy that is required should be defined by the science community. The required algorithm accuracy may vary based on the ground cover due to the mission’s science objectives. For instance, incorrectly classifying pixels as clouds over the desert will have a limited impact on the global plant species map science product. The HyspIRI mission will use a threshold algorithm to detect clouds in data collected by the VSWIR instrument. Preliminary work presented here has shown that threshold algorithms have suitable accuracy in detecting clouds. Once the cloudy pixels have been detected, they will be lossily compressed by reducing the spectral and radiometric resolutions of the data. A threshold cloud detection algorithm has been baselined for the HyspIRI mission due to its simplicity. Since the classifier is only required to classify pixels into two classes (cloud and non-cloud), a threshold algorithm can meet the mission’s needs. 3. C OMPRESSION S CHEMES Given the large volume of data to be produced by HyspIRI, on-board data compression is critical to enabling the mission. When both instruments are operating in full resolution mode, the HyspIRI onboard processing system must be able to process up to 70Msamples/second in real time before the data can be sent to the solid state recorder and prepared for downlink by the communications system. Based on the last JPL Team X design, onboard data storage is limited to 1.6 Terrabits, or 200 Gigabytes, of solid state memory [1]. This memory constraint, along with the maximum downlink rate capa- Preliminary Algorithm Results In order to demonstrate the feasibility of the proposed preliminary calibration and threshold cloud detection algorithm, the process was applied to two scenes of AVIRIS data. Although AVIRIS data does differ slightly from the data that would be collect by the VSWIR instrument, it is similar enough in terms of wavelengths of spectral bands to be used to test the 5 Figure 3. Cloud Detection Results for Pennsylvania AVIRIS Data. a) shows the full image swath, with true color on left with the cloud detection algorithm results on the right. b) is a close-up from same data set. It can be seen that the algorithm has some difficulty detecting small clouds and cloud edges. bilities are the two main drivers of the need for onboard real time data compression for the HyspIRI mission. The specific requirements on the compression algorithm as well as a comparison between select algorithms will be presented. because they are of less scientific interest). Since the instrument is designed to collect high resolution data, the compression should not degrade that resolution. 2. Does the algorithm function fast enough to be implemented in real time with a high throughput data rate? Compression Algorithm Requirements The compression algorithm was chosen in response to three driving requirements: The speed requirement of the algorithm is a simple result of the data rate at which the sensors will collect data. An algorithm that is fast will have low computational requirements. A fast algorithm will also utilize hardware implementation, meaning all of its operations can be easily described in a hardware description language and implemented on an integrated circuit or field programmable gate array (FPGA). 1. Does the algorithm effectively compress hyperspectral data in a lossless manner? As expected, it is desirable to implement the highest degree of compression possible without losing the scientific usefulness of the data. Lossless compression allows a reduction of data volume while still maintaining the ability to exactly reconstruct the original file. In this application lossless compression is necessary to preserve the scientific integrity of the data collected. Among the things that set this mission apart from previous remote sensing missions is the fine spatial resolution of the instruments. Therefore, in order to produce a superior end-level data product it is important that at no point is the data compressed in a lossy manner (Note that this approach is for the land and shallow water regions only. Cloudy areas taken by the VSWIR instrument are lossily compressed 3. Can the algorithm be implemented on-board? Since the algorithm will be implemented on-board the spacecraft, it must have low computational and memory requirements. Each sample will require some number of computational operations when being compressed; this number should be on the order of tens of operations per sample [12]. In addition, multiplication operations take a larger number of clock cycles so an efficient algorithm will be optimized to use few multiplication operations and more add, subtract, and bit shift 6 operations. cloud map is then sent to the solid state recorder. If the compressor is compressing a cloudy pixel, the pixel is initially lossily compressed, then additional lossless compression is applied. The initial compression works by first averaging the 214 spectral bands to 20 and reducing each sample from 14 to 4 bits. If the compressor is compressing a non cloud pixel, only the lossless compression algorithm is applied. The compressed data stream is sent to the solid state recorder, where it will be stored until it can be transmitted to a ground station. The compression algorithm will be implemented on a field programmable gate array (FPGA). Fast Lossless Compression Algorithm Five lossless compression algorithms were compared in terms of compression ratio by Aaron Kiely at JPL (see Table 3). The data compression algorithms were tested using calibrated 1997 16 bit AVIRIS data from the Jasper Ridge, Lunar Lake and Moffett Field datasets. The table shows the average number of bits/sample to which the data was reduced. The Fast Lossless algorithm, developed by Matt Klimesh at JPL, had the lowest number of output bits per sample and the overall best performance. The average compression ratio achieved by the Fast Lossless algorithm for these datasets was 3.1:1. Traditional algorithms that have previously been used on hyperspectral data produced a higher average bits/sample and less effective compression ratios [13]. For detailed information on the Fast Lossless Compression algorithm, please see [12]. Figure 4. Data Flow through the compressor for the VSWIR instrument. Hardware Implementation The motivation for hardware implementation of the onboard compression algorithm is speed. Hardware compression may perform as much as five times faster than software compression for data of this type [12]. To implement the algorithm in hardware, it is translated from C code (executed as a list of software instructions) to a hardware description language (executed as a propagation of signals through circuits). The low complexity and minimal computational requirements of the Fast Lossless compression algorithm make it very well suited to be implemented using an FPGA. Table 3. Comparison in terms of resulting bits per sample after processing by compression algorithms. Initially, all samples were initially expressed with 16 bits. In addition to supplying an advantage in terms of compression ratio over other algorithms, the Fast Lossless Compression Algorithm is also well suited for implementation on an FPGA due to the algorithm’s low number of calculations per sample. The memory requirements are also low, requiring only enough memory to hold one spatial-spectral slice of the data (on the order of hundreds of Kbytes) [12]. Work to implement the Fast Lossless Compression Algorithm on an FPGA is being done at JPL. The algorithm was tested using a Xilinx Virtex 4 LX160 FPGA mounted on a custom made evaluation board and preliminary test results demonstrated a throughput data rate of 33 Msamples/sec at a system clock rate of 33MHz. The algorithm used approximately ten percent of the chips’ available programmable logic and interconnect resources [12]. The VSIWIR instrument will collect samples at a maximum rate of 60 Msamples/sec, the TIR will collect at a maximum of 10 Msamples/sec, for a combined maximum sample rate of 70 Msamples/sec, which should be within the capabilities of the board. Figure 4 shows the data flow from the instrument to the solid state recorder in the case of the VSWIR instrument. Data from the sensor is initially calibrated to reflectance values which can be analyzed by the cloud detection algorithm. Calibration data is sent to the solid state recorder. The cloud detection algorithm generates a cloud map that becomes an input to the compressor. The cloud map can be thought of as a matrix of cells that correspond to a spatial location on the image. Each cell holds one bit to tell the compressor if the current pixel is a cloud pixel or a non cloud pixel. The FPGAs also provide appealing features such as high density, low cost, radiation tolerance, and functional evolution [14]. As science goals change or as spacecraft capabilities are limited, the ability of an FPGA to be reprogrammed from earth allows for the functional evolution of hardware thought the 7 life of the mission. For the HyspIRI mission this means that if a new compression algorithm with a higher compression ratio or a more accurate cloud detection algorithm with better cloud classification is developed, the new technology could be implemented on the satellite with relative ease. In March 2008, Xilinx Inc., the main manufacturer of space ready FPGAs, has released the Radiation-Tolerant Virtex-4 QPro-V Family, providing good options for radiation-tolerant FPGAs with the speed and processing power required for the HyspIRI mission [15]. Thus, based on a comparison of available compression algorithms and the HyspIRI missions requirements, an FPGA implementation of the Fast Lossless Compression Algorithm is the recommended compression scheme for the mission. that clouds will only be identified on the VSWIR instrument when it is viewing LSW regions. Some key elements of the model are shown in Table 4. 4. C OMMUNICATIONS A RCHITECTURE Once the data has been compressed, it will be transmitted to a series of ground stations using a given communications protocol. X-band, Ka band, TDRSS, and LaserComm architectures were compared in terms of their ability to accommodate the volume of data to be transmitted, their cost and mass, and their heritage level. An X-band architecture utilizing three ground stations was chosen. Data Flow Modeling Table 4. Important values and assumptions in the Data Flow Model. In order to design the communications system, it was necessary to determine the desired data volume per unit time that must be transmitted from the satellite to the ground. To do this, a data flow model was created to calculate the data rate and volume that would be created aboard the spacecraft as seen in Figure 5. The bits per orbit for each of the instrument and modes were then summed over one day. Note that the VSWIR is considered to be ‘on’ for half the orbit; the actual duty cycle would be less due to the minimum 20◦ sun angle. Therefore, the model will slightly overestimate the data taken over the ocean. The results of the model are shown below in Table 5 with comparative data volumes shown in Figure 6. One day was chosen as the time unit because it allows for a reasonably accurate estimate over multiple orbits, thereby reducing fluctuations in data generation due to non-uniform geography. The data volumes shown are data outputs for the instruments. A 10% overhead amount was added to account for metadata, calibration data, cloud map, and packetization. This average downlink data volume per day of 5,097 Gbits was used to size the communication system. As the data flow diagram (Figure 5) shows, the model takes into account the satellite orbital parameters, the desired sensor configuration (swath width, trackwise and swathwise pixel resolution), spectral resolution (number of bands), intensity resolution (bits per band), compression ratio, and a model of Earth’s geography. Geography is important since the sensors’ data output rates will vary according to the sensors’ target (Land/Shallow Water (LSW), Cloud, or Ocean). To determine the percentage of time the sensors will view the features, a map was overlaid onto a Satellite Orbit Analysis Program (SOAP) model of the satellite developed by Francois Rogez at JPL. The map determined the satellite’s sensor mode – whether it should record with LSW or Ocean resolutions. By running this model over a 19-day period at 5minute resolution (as to account for the entire repeat cycle), the percentage of time the satellite would collect data from the designated regions (LSW or Ocean) was identified. Communications Architecture Selection The main driver in the selection of HypsIRI’s communication architecture was the large data downlink rate required. Because the HyspIRI mission aims to provide consistent, lowrisk operations, redundancy and heritage were considered to be major selection factors. Cost and masses of various architectures were also considered. Table 7 shows a comparison between the architectures considered. Since the locations of clouds are unpredictable, a statistical average was used to predict the time the sensor would be viewing clouds. With an 11 AM local descending node, the expected cloud cover is approximately 30%. Estimating a 2-in-3 detection success rate for pixels that are cloudy, we estimate 20% of LSW areas will be labeled as ‘cloud’ [2]. Note Ka band frequencies (between 18-40 GHz) are often used in communications satellites. These frequencies are chosen be8 Figure 5. Simplified operation of the data flow model Table 5. Data sum over one day of normal science operations. The added 10% overhead accounts for the required added metadata. Table 6. Link Budget for X-band to Svalbard GS. cause of their ability to provide high gain as well as high data rates with smaller physical structures. The Ka band was ruled to be unfeasible because most ground stations do not support Ka band, and the use of Ka would mean introducing a new standard to multiple ground stations while modem upgrades to the exisiting X-band receiving systems were ruled more 9 un-upgraded ground stations and increase the data envelope should the upgraded ground stations (at 800 Mbps) fail. Additionally, many high-latitude ground stations such as Svalbard and Fairbanks, Alaska [19], [20], [21] are compatible with the X-band frequencies. Power Required Data Rate Data Margin Cost X-band 200 W 800 Mbps 28.5% $9.5M TDRSS 320 W 300 Mbps 13.3% $11.2M LaserComm 40-60 W 6 Gbps 23.7% $20-30M Table 7. Comparison of the different communications architectures considered. Based on these characteristics, an X-band-based architecture was baselined for the mission. X-band Architecture Details Figure 6. Comparative sizes of the data volumes. Note that LSW (VSWIR) and LSW/Cloud (TIR) represent 98% and 97% of the overall data volume, respectively. High data rate X-band heritage—Two LEO imaging satellites were benchmarked to find a starting point for the design of the X-band communications system. The first is the DigitalGlobe WorldView-1 satellite, launched in September of 2007. Using two 400-Mbps X-band radios, it achieves an overall data rate of 800 Mbps [18]. The other is the GeoEye-1 Satellite, which will be launched in September of 2008, which is specified to be capable of 740 Mbps and 150 Mbps [22]. cost-effective. “TDRSS is a communication signal relay system which provides tracking and data acquisition services between low earth orbiting spacecraft and NASA/customer control and/or data processing facilities” [16]. TDRSS, while not supporting as high rates as can be achieved by X-band, is capable of continuously operating for long link times (>20 minutes per orbit). TDRSS was investigated due to its long link times and its convenient ground station location at White Sands. However, using TDRSS with the current return link data rate would require more power from HyspIRI as well as more complicated structural requirements due to the need for a larger antenna to meet Effective Isotropic Radiative Power (EIRP) requirements on the HyspIRI-to-TDRSS-side. The DigitalGlobe WorldView-1 is of special interest because of its data rate. The satellite is specified to collect 331 gigabits of data per orbit, which is extremely close to the 343 gigabits per orbit expected to be collected by HyspIRI mission [18]. Spacecraft Segment for X-band—To achieve 800 Mbps transmission, the satellite would use two 400 Mbps radios simultaneously. To accomplish this at the same band, the signals would be linearly and perpendicularly polarized [1]. The WorldView-1 satellite, which operates at 800 Mbps, uses the L-3 Communications T-722 X-band radio, which operates between 8 and 8.4 GHz. The radio has an output power of 7.5 W with an OQPSK modulation scheme, which increases bitrate from BPSK without introducing error associated with higher-order PSK [23]. To determine if this system would be feasible for HyspIRI, a link budget was created, as can be seen in Table 6. After determining the desired acquisition location, the link geometry (see Table 8) was used to conduct link budget sizing using a sample link budget from David Hansen at JPL. The budget was set up to produce a minimum link margin of 6 dB with the available 7.5 W that is specified by the T-722 radio [24]. The budget is for one radio (each radio requiring data rates of 400 Mbps); since the other will be identical but perpendicularly polarized, the budget applies for both radios. LaserComm is a relatively new technology under development in labs, including JPL. LaserComm has the potential to transmit data at extremely fast rates (up to 10 Gbps), making it attractive to the HyspIRI mission [17]. While the link speed has been demonstrated, the link rate does not seem to be supportable due to the data rate capabilities of the Solid State Recorder (SSR). Faster storage media must be developed before the satellite can support such a high data rate. It also remains the most expensive of the communications system options. X-band was deemed to be the best solution overall for HyspIRI. The X-band system is estimated to be the least expensive of the three options while having excellent heritage in the DigitalGlobe WorldView-1 satellite both for data rate and data volume [18]. X-band also provides HyspIRI with the ability to reduce the data rate (as GeoEye-1 does) to utilize As this rough link budget shows, Svalbard ground station is theoretically capable of 10 dB more gain than is necessary to 10 GS Link Elevation Max Slant Distance Off-Nadir Gimballing Angle 10◦ above horizon 1984 km 64◦ Table 8. Link Geometry for 623 km. Note the Gimballing angle must be 64 degrees; this has been ruled feasible with a 0.25m-dia parabolic antenna as noted in Table 6. make the link. Therefore, there is room to reduce the power, decrease the size of the antenna, or use smaller receiver antennas on the ground (and gain access to more ground stations) while maintaining a robust link. Physically, the radio system will require precise gimballing and rotational control to align the polarization with the receiver antenna at the ground station. The SA-200HP bus that is baselined for the mission has sufficient spacecraft pointing knowledge to supply for the 0.5◦ pointing error assumed in the link budget. The mass of the X-band system is shown below in Table 9. Item Radio Antenna and Gimballing S-band Rx Coax Switch Total Mass Figure 7. Each radio will downlink a single “File Bundle at a time. Mass 3.9 kg each 2.5 kg 2.5 kg 0.4 kg 12.2 kg these stations, no station supports data rates higher than 150 Mbps while most support approximately 100 Mbps [21]. Table 9. Mass Budget for X-band Architecture. Therefore, to use the 800 Mbps link, the ground stations must be upgraded from their current speed with new electronics and data storage. For the purposes of communications system sizing, it was assumed that with a combination of ground stations, it would be possible to provide an average of 10 minutes of downlink time per orbit. This could be accomplished by upgrading two to three high latitude ground stations. Because the two radios will be downlinking from the same pool of onboard data, it is necessary to create a file splitting scheme to downlink the data (see Figure 7). Since the instrument compression will be reset every several lines for error containment, the data that is processed will be separated by geographical segments. These segments will each correspond to a “file bundle” which contains the instrument data, calibration data, and cloud map data associated with that segment’s geographical location. Metadata in the files will be used to designate the location the file bundle correlates to. Each file bundle will be assigned to a radio so that at any one moment, two file bundles are transmitted together. During transmission, the next file bundle will then be queued for the next available radio. Since each file bundle contains metadata about its location, the data can be pieced back together at the ground station. As a contingency option, it should be feasible to decrease the data rate of HyspIRI’s communications system to 100 Mbps to utilize more ground stations for additional coverage time and data volume, or as a backup in case one of the high-rate ground stations is unavailable. Data Volume for X-band Communications— At 800 Mbps transfer rate with an average of 10 minutes of contact time per orbit, the X-band system will be able to downlink 7,125 Gb while the satellite should generate 5,097 Gb on average. This scenario yields an overall data margin of 28.5%. Ground Segment for X-band— Multiple NASA ground stations support X-band communications with current antennas. These ground stations include high-latitude ground stations Svalbard, Fairbanks, TrollSat, McMurdo, TERSS, and many others. These ground stations have dual-polarizing parabolic antennas, which would be necessary to support the downlink radio frequency structure. While this band is supported at Cost of X-band Communications—The estimated cost for Xband communications is shown in Table 10. Satellite communications costs are taken from the last TeamX report conducted in 2007. Ground station costs are estimates after consulting Joseph Smith at JPL [25]. Dedicated data line costs were estimated at $1M each over 3 years. 11 Table 10. Estimated cost of the X-band system. 5. G ROUND S TATION S ELECTION As mentioned previously, there are several ground stations that are compatible with X-band communications frequencies. It was necessary to determine how many and which ground stations should be used in order to accommodate the HyspIRI mission’s large data downlink volume. Analysis Methods Twenty ground stations were considered in this analysis. It was assumed that the ground stations would be upgraded to accommodate the 800 Mbs data rate required by the HyspIRI mission. A driving factor in the ground station selection process was the amount of onboard data storage available, currently 1.6Tb based on Team X sizing[1]. A Satellite Orbit Analysis Program (SOAP) scenario was created to provide the downlink durations over the entire 19-day period for each instrument. PERL scripts (created by Francois Rogez at JPL) translated the downlink durations into the amount of data being stored onboard the spacecraft based on the data downlink and uplink rates. Table 11. Target Values of Objective Function Attributes Four scenarios were created by varying the values of the weighting coefficients used in Equation 1 (see Table 12). The scenarios were run using one, two and three ground stations. The ground station combination that minimized the objective function was then chosen as the baseline configuration. Results While the onboard storage capabilities of the spacecraft limited the possible ground station options, other factors also influenced the selection of the ground station configuration. To account for these factors, a decision making tool was created. With respect to the onboard storage, the decision making tool took into account the peak volume, the slope (volume vs. time), and the average volume of the data stored over the 19 day repeat cycle. In addition to these characteristics, the configurations were also evaluated based on the cost associated with upgrading the ground stations, the assumed reliability, and the political control of each ground station. The objective function is shown in Equation 1. The T subscript represents the target and the A subscript represents the actual value. The objective function target values are given below in Table 11. objf unc = T −M axP eakA α M axP eak M axP eakT T −SlopeA +β SlopeSlope T AvgV alT −AvgV alA +χ AvgV alT +δ(Reliability − 1) +(P oliticalControl − 1) alGS +φ numGS−AvgV AvgV alGS It was found that the HyspIRI mission’s data volume is too large to be accommodated by just one ground station. For the scenarios generated using just one ground station, the amount of data stored onboard the spacecraft increased linearly over the 19 day collection cycle, clearly violating the 1.6Tb onboard data storage constraint. The amount of data collected by the HyspIRI mission is also too large to be accommodated by two ground stations: again, the amount of data stored onboard increases almost linearly over the 19 day repeat cycle. With three ground stations, however, there are several feasible ground station combinations for this mission. Of the 20 possible ground station combinations, there are 11 solutions where the onboard data storage volume varies sinusoidally (see Figure 8). Note that the trial numbers correspond to the ground station numbers shown in Table 13. However, once the 1.6Tb onboard data storage volume constraint is included in this analysis, there are only two feasible ground station configurations (see Figure 9). The configurations that meet the onboard data storage requirements are Fairbanks, McMurdo and TrollSat and Fairbanks, Svalbard, and TrollSat. (1) where: α+β+χ+δ++φ=1 (2) 12 Figure 8. Close-up Time series of amount of data stored onboard for scenarios using three ground stations upgraded to 800 Mbps. Figure 9. Three ground stations upgraded to 800Mbps solutions with 1.6Tb onboard storage constraint. 13 Table 12. Objective Function Coefficient Scenarios More advanced HyspIRI mission design conducted at JPL by Team X has now expanded the available onboard data storage to 3Tb. Sensitivity Analysis Given that only two combinations of three ground stations do not exceed the onboard data storage limits, it is necessary to understand the sensitivity of our results to the onboard data storage limits. This was accomplished by varying the weighting coefficients of the objective function (see Table 12). Table 14 shows the results for each of the scenarios. 6. S CIENCE A NALYSIS In order to more fully understand the requirements associated with the data throughout the mission, it was necessary to define data levels both on-board the spacecraft and on the ground. By defining the form and scientific significance of the data at each stage, a greater understanding of the data system’s role was gained. Additionally, by gaining a greater understanding of HyspIRI’s data products, it was possible to identify missions with which HyspIRI could produce complementary science products. Table 14 shows that, for every scenario, TrollSat was included in the best ground station combination. Fairbanks was the second-most favored ground station with McMurdo and Svalbard tied for third place. Sweden and Tasmania were never chosen for the best combination. This does not mean that they were not feasible solutions. However, with the given weighting functions, they were never chosen as the best solution. Data Level Definition The optimal ground station configuration from Scenario 2 (Fairbanks, McMurdo, and Trollsat) was baselined for this mission due to its emphasis on the 1.6Tb maximum onboard data storage constraint. Definitions for data levels follow (see Figure 10). Level 0 Raw Data: HyspIRI will collect two types of raw data: the science data from the two sensors and the calibration data. As the sensors collect photons, they will output analog signals resulting from the light intensity levels as well as the sensors’ material properties. The calibration data is obtained through lunar, blackbody, solar, and dark level calibration. Lunar and blackbody calibration will be conducted by satellite pointing. Solar calibration will be conducted by reflecting solar light off the instrument cover. Dark level calibration will be conducted by closing the instrument cover. Ephemeris data will be obtained through the onboard GPS and star-sensors. The calibration data will be needed on-board the spacecraft in order to conduct the preliminary calibration required for cloud detection. Preliminary analysis was conducted into the sensitivity of the system to missing a downlink opportunity. This analysis was conducted for the ground link scenario experienced on the second day of the spacecraft’s 19 day repeat cycle. Also, this analysis does not look at the possibility of loosing contact with several different ground stations over the course of a day. Preliminary results are shown in Table 15 and a more detailed discussion can be found in [26]. It can be seen that virtually all of the cases considered exceed the 1.6Tb of available onboard storage. More work needs to be done in this area to adequately address the risk of missing downlink opportunities throughout the mission. Level 1A Roughly Calibrated to Reflectance Data: Table 14. Objective Function Results Table 13. Ground Station Reference Numbers 14 Ground Station Fairbanks Trollsat Svalbard Number of Passes Missed 1 All (8) 1 All (10) 1 All(10) Max Required On-board Storage 2 Tb 2.7 Tb 1.5 Tb 3.7 Tb 1.7 Tb 3.8 Tb Table 15. Amount of storage required when downlink opportunities are missed. These preliminary results were computed on the second day of the spacecraft’s 19 day repeat cycle. All passes missed indicates that all the passes over the ground station were missed for a given day. Figure 10. The data level definitions are presented. The chart flows from left to right, top to bottom through levels 0, 1A, 1B, 1C, 2, 3 and 4. These visual examples belong past mission imagery. Visual examples for levels 1A, 2, and 3 belong to AVIRIS data and calibration charts, while visual examples for Level 4 belongs to ASTER data. Once the data is converted to digital number values, it will be calibrated using the dark level calibration values to correct for sensor offsets due to material flaws. During dark level calibration the sensors should not detect any signal, thus any signals from the sensor indicate an erroneous offset. This dark-level-calibrated data must then be converted from sensor readings to spectral radiance values. This conversion process takes inputs from solar and satellite zenith and azimuth angles, radiometric information (stored onboard in coefficient appendices), and top-of-atmosphere effects (scattering). The data will be tagged with universal collection time, and latitude and longitude. then generate reflectance values. This conversion must be completed onboard because reflectance allows for the detection of clouds for the VSWIR instrument. Level 1B Compressed, Packetized Data: Level 1B data consists of compressed, packetized data. Cloudy pixels undergo a lossy compression and a lossless compression while non-cloudy land-and-shallow-water pixels undergo only lossless compression. Ocean pixels will undergo lossy compression. Because there is little scientific value gained from VSWIR images of clouds, cloud maps are generated to enable higher compression for clouds. In the case of the TIR, all land-and- Radiance values that have been atmospherically corrected can 15 shallow-water data, regardless of cloud presence, will be subject to lossless compression. Compressed Level 1A data is then packaged alongside the ephemeris metadata to become Level 1B data, which is downlinked to ground stations. Packetization is done such that communications errors causing the loss of one packet will not affect the data integrity of other packets. sions, and ground-based measurements. These will permit the creation of high-impact science products. It is possible that the HyspIRI mission could collaborate with the Aerosol and Cloud Ecosystems (ACE) mission, which is also motivated by the NRC Decadal Survey. By collaborating with ACE – a mission scheduled launch between 2013 and 2016 – more accurate predictions of aquatic ecosystem responses to acidification and organic budget changes can be obtained from disturbance effects on distribution, biodiversity, and functioning of ecosystems (HyspIRI science products) as well as ocean CO2 sinking and acidity changes (ACE science products). Also, correlation between ecosystem changes and ocean organic measurements can be studied by combining disturbance effects on distribution, biodiversity, and functioning of ecosystems (HyspIRI science products) and oceanic organic material budgets (ACE science product). Through collaboration with other mission’s the impact of HyspIRI’s data products will be increased. Level 1C Decompressed, Depacketized Data: Once data packets are received on ground, the data is depacketized and decompressed. The packets are then assembled to create full-resolution, continuous data sets as Level 1C data. This data is backed up and will be stored permanently, as is described below. Level 2 Atmospheric Corrected Data: In order to generate Level 2 data, instrument measurement values (previously calibrated) are resampled, remapped, and recalibrated with more sophisticated and accurate methods. Data is also geometrically calibrated and orthorectified. Further corrections may involve cosmic ray compensation and bad pixel replacement. Spectral signatures are identified. A series of high-level information (physical, optical, radiance, and other derived elements) can be viewed at this level. Spectral signatures (derived from reflectance), surface radiance, temperature, emissivity, and composition are available at this level. HyspIRI’s many science products produced at the various data levels will add to the large volume of data that must be stored once the data is sent to the ground. Additionally, clear definitions of the science products and data levels will enable the creation of an efficient and accessible ground data storage system. 7. DATA S TORAGE AND D ISTRIBUTION Level 3 Mapped Data: Geophysical parameters are mapped onto uniform space grids located in space and time with instrument location, pointing and sampling. Volcanic eruptions, landslides, wildfires, natural resources, drought, soil and vegetation type, will be mapped. Gridded maps are created for each image. Files may be located regionally. A series of overlapping data per orbit may be encountered. All files will be made available, including overlapping files. Long-term data and science product storage is becoming an increasingly relevent issue in data system design. Afterall, the worth of the mission is based on the accessibility and usability of the data once it has been sent to the ground. The combined data collection rate of HyspIRI’s two instruments exceeds 1.5TB/day and nearly 0.56 PB/year. The raw data ingest rate alone would increase the current Earth Observing System Data and Information System (EOSDIS) daily archive growth by 48%, not including the storage of science products. The HyspIRI Adaptive Data Processing System (HIIAPS) is a proposed data processing architecture based on the Atmospheric Composition Processing System (ACPS) with minor modifications. HIIAPS would handle the challenge of archiving, processing, and distributing such an immense amount of data while keeping costs low. Level 4 Temporal Dependant Data: As the instruments finish their repeat cycles, they will produce global frames of science products; 5 days per frame for the TIR and 19 days for the VSWIR. With these frames, it is possible to characterize the earth in terms of the science data products every repeat cycle. These frames will be compiled based on collection time, producing a 4-D model of the earth which can be used to view local or global changes as a function of time. This 4-D model would be made available on a monthly, seasonal, or yearly basis. Lessons Learned from EOSDIS The Earth Observing System (EOS) Data and Information System (EOSDIS) began development and implementation in the early 1990s and has been fully operational since 1999 [27], [28], [29]. EOSDIS is the primary information infrastructure that supports the EOS missions. The responsibilities of EOSDIS could originally be divided into two sectors: mission operations and science operations [30], [27], [28], [31]. Mission Operations handles satellite flight operations, data capture, and initial processing of raw data. Most of the Mission Operations responsibilities are now handled by Earth Science Mission Operations (ESMO) [30], [32]. Sci- Synergy with Other Missions The HyspIRI mission will produce science products for various monitoring and early warning programs. If combined with science products from other missions, the HyspIRI science products can serve for even greater societal benefit and scientific interest. HyspIRI’s science products can be combined with missions that will be collecting data simultaneously around 2013-2016, as well as with past and future mis16 ence Operations processes, distributes, and archives the data [30], [27], [28], [31]. NASA Integrated Services Network (NISN) mission services handle the data transfer between the mission operations system and the science operations systems [30]. dia gradually as costs and performances improve and as they are needed. To be able to do this in the most cost effective manner a system usage analysis should be conducted in order to properly size the storage and processing systems. For such a system there should be a detailed plan in place for the migration of data from old to new storage media as the older storage media becomes obsolete or nonfunctional [27]. There are several valuable lessons that were learned from the operation and continued evolution of EOSDIS. The Goddard Earth Sciences (GES) Distributed Active Archive Center (DAAC) and Langley Research Center (LaRC) Atmospheric Science Data Center (ASDC) evolutions emphasize reducing operations costs by maintaining only a single processing system, increasing automation through the use of simpler systems, reducing engineering costs through the replacement of COTS (commodity-off-the-shelf) products with commodity products, and increasing accessibility by increasing disk usage by phasing out existing tape based systems [28]. The parallel development of Archive, Next Generation (ANGē) and Simple, Scalable, Script-Based, Science Processing Archive (S4PA) at GES DAAC and LaRC ASDC DAAC demonstrate that NASA continues to encourage individual entities, such as the DAACs and Science Investigator-led Processing Systems (SIPSs), to suggest and develop new approaches [28]. A summary of the lessons learned from the development and continued operation of EOSDIS (from “Evolving a Ten Year Old Data System”[28]) are briefly listed below: Keeping all affected parties involved is especially relevant to the formatting of the data, which will be discussed later. Involvement of these parties combined with continuous smallscale prototyping speeds the generation and integration of new ideas and also serves as a risk management measure to prevent changes being made to the system that would be detrimental to the user community. Involvement of these parties also helps to determine how access to data should be kept online and how that data might be most easily accessed online. HyspIRI Adaptive Processing System HIIAPS (HyspIRI Adaptive Processing System), pronounced “high-apps,” is a proposed stand alone data processing system which would handle the data processing, storage, and distribution requirements of the HyspIRI mission. It would be based on ACPS with only the minor modifications that occur in adapting a previous system to the slightly different needs of another mission. JPL’s Team X states that “Science team has responsibility for all storage and analysis . . . Data is made available (online) to full science team for providing atmospheric corrections and producing products” [1]. An ACPSlike processing system would fulfill these requirements. HIIAPS would be a centralized dedicated SIPS and DAAClike system that would be co-located with the instrument and science teams. The MODIS Adaptive Processing System (MODAPS) has demonstrated that co-locating the science team with the processing center reduces cost by saving time on the development and integration of new algorithms [33]. It is also important to note that HIIAPS would have no commercial or proprietary parts thus evading the issues of software licensing. A diagram of the HIIAPS data processing context is shown in Figure 11. Having a program requirement for continuous technology assessment establishes a culture of innovation • An on-going, small-scale, prototyping effort within operational elements is important for exploring new technologies and hardware • Involvement by all affected parties is essential to evolution planning • It is necessary to collect and analyze system usage metrics (e.g. request patterns for various data products) in planning for the future • Flexible management processes accommodating innovation, speed, and efficiency are essential for increasing agility in development despite the higher perceived risks • A risk management strategy must be included in planning for evolution • Transitions affecting the user community need to be carefully managed with sufficient advance notification • All storage and analysis for the HyspIRI mission would be done on site. Level 1C data is archived and distributed at HIIAPS. HIIAPS will also process and archive Level 3 and below data leaving higher level products to academia and the private sector. Data products would be sent directly to the HypsIRI science team and an offsite archive. This offsite archive could be a DAAC or another data processing center that serves as a risk management precaution in the event of a natural disaster at the HIIAPS site. Like MODAPS, HIIAPS would employ data pool technology to save on storage and processing costs. Higher level, smaller data products and recently created products would be kept in the data pool while more cumbersome products like Level 1C data would be produced on demand. Also like previous systems, HIIAPS would provide custom data products in the form of virtual products that are made to order. By building upon the knowledge base All of these lessons point toward an independently operated system built for continuous evolution. As has been stated before HyspIRI generates extremely large amounts of data. A highly scalable and adaptive system would be the most cost effective approach to handle the processing, archiving, and distribution needs of the mission. The presence of a culture of innovation would facilitate the necessary continuous reassessment of the technology present at HIIAPS. Such technology reassessments would facilitate a smooth continuous integration of new processing and storage components in both the hardware and software realms. Thus initial development costs could be reduced by eliminating the need to purchase all storage media at once and instead purchasing storage me17 plished by distributing the data in a format that is easy to use. A standard format would be helpful because it would guarantee that everybody could read that file format. However, there is no single standard format for geospatial data [38]. NASA’s EOSDIS currently uses HDF-EOS5, which is good for processing but not everyone can easily read the HDF file format [39], [40], [41]. Even though there is no standard some formats are more easily accessible than others. Using a GIS (Geographic Information Systems)-ready format like a flat georeferenced JPEG or PNG image would make HyspIRI data readily accessible to an already large user base and to those who do not use a GIS program. The Open GIS ConsorR tium, Inc.(OGC) is working to solve this formatting issue and towards Open GIS [42], [43]. “Open GIS is the full integration of geospatial data into mainstream information technology. What this means is that GIS users would be able to freely exchange data over a range of GIS software systems and networks without having to worry about format conversion or proprietary data types.”[43] Working with the OGC, or at least staying in touch with their progress, would help to maximize interoperability and accessibility. Based on this HyspIRI should have at least two standard data formats. One would be HDF-EOS5 because that is the standard NASA EOS format and HIIAPS can draw on pervious NASA experience for it. The second would be a GIS ready format like a georeferenced JPEG or PNG to increase accessibility to HyspIRI data. Figure 11. HIIAPS Data Processing Flow of previous processing systems, HIIAPS would both reduce the costs and risk of creating a new processing center. This section looked at how HyspIRI might fit into the EOSDIS framework, several previous processing systems, and touched on accessibility and interoperability concerns. HyspIRI should take advantage of the present EOSDIS infrastructure. However, HIIAPS should be a stand alone processing center so that HyspIRI’s large data volume does not overtax the existing EOSDIS system. There are three heritage systems for HIIAPS: MODAPS, OMIDAPS, and ACPS, each an the evolution of the next. There has been a steady shift from tape-based storage to disk-based storage over the years. Thus HIIAPS should use a disk-based storage system. In addition to storage and processing, distribution is also a concern for the HyspIRI mission. HyspIRI data should be made available at no charge to as many people as possible in a manner accessible to as many people as possible. Some of the challenges are choosing a data format, reaching non-remote sensing scientists, and providing the data in an easy to find manner. Using Google Earth TM and supplying HyspIRI data products in more than one format will help to solve this challenge. Storage Media: Disk vs. Tape Initially, EOSDIS used tape-based storage. Now it is slowly transitioning to disk based storage [28], [31], [34]. Tape is easily transportable, has faster read/write speeds, but it is sequential access storage and a nearline media [35], [36], [37]. On the other hand, disk has a higher storage capacity per unit, makes the data easily accessible online, experiences more frequent technology upgrades, but has higher electrical costs and is more prone to error [35], [36], [37]. Disk storage is also more convenient to use because it is random access and data can be kept online (as opposed to nearline or offline) all the time, thus shortening data retrieval times. It was been shown that disk storage is more expensive in the short term, but in the long term the cost difference between disk and tape systems seems to converge [29]. This combined with the fact that ACPS, the Ozone Monitoring Instrument Data Processing System (OMIDAPS), and MODAPS are all disk based storage leads to the conclusion that HIIAPS should also be an entirely disk based system, especially with the Kryder’s Law (disk storage densities double every 23 months) taken into account [38]. 8. C ONCLUSION An end-to-end data system has been described for the HyspIRI mission, a high data volume Earth-observing mission. With the HyspIRI mission, ambitious science objectives are driving the development of a cutting-edge data handling system, both on-board the spacecraft and on the ground. Data Formats As the previous section states it is imperative that HyspIRI data is made available to as many non-remote sensing scientists as possible. Part of achieving this goal will be accom18 The HyspIRI mission utilizes innovative techniques to both reduce the amount of data that must be transmitted to the ground and accommodate the required data volume on the ground. The infrastructure and techniques developed by this mission will open the door to future high data volume science missions. The designs presented here are the work of the authors and may differ from the current HyspIRI mission baseline. [11] C. Burges, “A tutorial on support vector machines for pattern recognition,” Data Mining and Knowledge Discovery, vol. 2, pp. 121–167, 1998. [12] M. Klemish, “Fast lossless compression of multispectral imagery, internal jpl document,” October 2007. [13] F. Rizzo, “Low-complexity lossless compression of hyperspectral imagery via linear prediction,” p. 2, IEEE Signal Processing Letters, IEEE, 2005. ACKNOWLEDGMENTS [14] R. Roosta, “Nasa - jpl: Nasa electronic parts and packaging program.” http://nepp.nasa.gov/docuploads/3C8F70A32452-4336-B70CDF1C1B08F805/JPL%20RadTolerant%20FPGAs%20for%20Space%20Applications.pdf, December 2004. This research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, and was sponsored by the Space Grant program and the National Aeronautics and Space Administration. [15] I. Xilinx, “Xilinx : Radiation-hardened virtex-4 qpro-v family overview.” http://www.xilinx.com/support/documentation/ data sheets/ds653.pdf, March 2008. R EFERENCES [1] K. Warfield, T. V. Houten, C. Heeg, V. Smith, S. Mobasser, B. Cox, Y. He, R. Jolly, C. Baker, S. Barry, K. Klassen, A. Nash, M. Vick, S. Kondos, M. Wallace, J. Wertz Chen, R. Cowley, W. Smythe, S. Klein, L. Cin-Young, D. Morabito, M. Pugh, and R. Miyake, “Hyspiri-tir mission study 2007-07 final report, internal jpl document,” TeamX 923, Jet Propulsion Laboratory, California Institute of Technology, 4800 Oak Grove Drive, Pasadena, CA 91109, July 2007. [2] R. O. Green, “Hyspiri summer 2008 overview,” 2008. Information exchanged during presentation. [3] S. Hook, 2008. Information exchanged during meeting discussion. July 16th. [4] R. O. Green, “Measuring the earth with imaging spectroscopy,” 2008. [5] “Moore’s law: Made real by intel innovation.” http://www.intel.com/technology/mooreslaw/index.htm. [6] T. Doggett, R. Greeley, S. Chein, R. Castano, and B. Cichy, “Autonomous detection of cryospheric change with hyperion on-board earth observing-1,” Remote Sensing of Environment, vol. 101, pp. 447–462, 2006. [7] [8] [9] [16] G. S. F. Center, “Tdrss overview.” http://msp.gsfc.nasa.gov/tdrss/oview.html, 7. [17] H. Hemmati, 07 2008. Information exchanged during meeting about LaserComm. [18] “Worldview-1.” http://www.digitalglobe.com/index.php /86/WorldView-1, 2008. [19] “Svalbard ground station, norway.” http://www.aerospacetechnology.com/projects/svalbard/, 7 2008. [20] “Satellite tracking ground http://www.asf.alaska.edu/stgs/, 2008. station.” [21] R. Flaherty, “Sn/gn systems overview,” tech. rep., Goddard Space Flight Center, NASA, 7 2002. [22] “Geoeye-1 fact sheet.” http://launch.geoeye.com/launchsite/about/fact sheet.aspx, 2008. [23] L-3 Communications/Cincinnati Electronics, Space Electronics, 7500 Innovation Way, Mason, OH 45040, T-720 Transmitter, 5 2007. PDF Spec Sheet for the T720 Ku-Band TDRSS Transmitter. R. Castano, D. Mazzoni, N. Tang, and T. Dogget, “Learning classifiers for science event detection in remote sensing imagery,” in Proceedings of the ISAIRAS 2005 Conference, 2005. [24] L-3 Communications/Cincinnati Electronics, Space Electronics, 7500 Innovation Way, Mason, OH 45040, T-722 X-Band, 7 2007. PDF Spec Sheet for the T-722. S. Shiffman, “Cloud detection from satellite imagery: A comparison of expert-generated and autmatically-generated decision trees.” ti.arc.nasa.gov/m/pub/917/0917 (Shiffman).pdf. [25] J. Smith, 07 2008. Information exchanged during meeting about GDS. [26] J. Carpena-Nunez, L. Graham, C. Hartzell, D. Racek, T. Tao, and C. Taylor, “End-to-end data system design for hyspiri mission.” Jet Propulsion Laboratory, Education Office, 2008. M. Griggin, H. Burke, D. Mandl, and J. Miller, “Cloud cover detection algorithm for eo-1 hyperion imagery,” Geoscience and Remote Sensing Symposium, 2003. IGARSS ’03. Proceedings. 2003 IEEE International, vol. 1, pp. 86–89, July 2003. [27] J. Behnke, T. Watts, B. Kobler, D. Lowe, S. Fox, and R. Meyer, “Eosdis petabyte archives: Tenth anniversary,” Mass Storage Systems and Technologies, 2005. Proceedings. 22nd IEEE / 13th NASA Goddard Conference on, pp. 81–93, April 2005. [10] V. Vapnik, Advances in Kernel Methods: Support Vector Learning. MIT Press, 1999. 19 [28] M. Esfandiari, H. Ramapriyan, J. Behnke, and E. Sofinowski, “Evolving a ten year old data system,” Space Mission Challenges for Information Technology, 2006. SMC-IT 2006. Second IEEE International Conference on, pp. 8 pp.–, July 2006. [42] “Welcome to the ogc http://www.opengeospatial.org/, 2008. [43] “Open gis: Gis lounge - geographic information systems.” http://gislounge.com/open-gis/. [29] S. Marley, M. Moore, and B. Clark, “Building costeffective remote data storage capabilities for nasa’s eosdis,” Mass Storage Systems and Technologies, 2003. (MSST 2003). Proceedings. 20th IEEE/11th NASA Goddard Conference on, pp. 28–39, April 2003. Christine M. Hartzell received her B.S. in Aerospace Engineering for Georgia Institute of Technology with Highest Honors in 2008. She is currently a PhD student at the University of Colorado at Boulder, where she is researching the impact of solar radiation pressure on the dynamics of dust around asteroids. She has spent two summers working at JPL on the data handling system for the HyspIRI mission, with particular emphasis on the cloud detection algorithm development and instrument design. [30] “Earth science data and information system (esdis) project.” http://esdis.eosdis.nasa.gov/index.html. [31] M. Esfandiari, H. Ramapriyan, J. Behnke, and E. Sofinowski, “Evolution of the earth observing system (eos) data and information system (eosdis),” Geoscience and Remote Sensing Symposium, 2006. IGARSS 2006. IEEE International Conference on, pp. 309–312, 31 2006Aug. 4 2006. [32] “Earth science mission operations http://eos.gsfc.nasa.gov/esmo/. (esmo).” Jennifer Carpena-Nunez received her B.S. in physics in 2008 from the University of Puerto Rico, where she is currently a PhD student in Chemical Physics. Her research involves field emission studies of nanostructures and she is currently developing a field emission setup for further studies on nanofield emitters. The summer of 2008 she worked at JPL on the HyspIRI mission. There, she was responsible for the science analysis of the data handling system, specifically defining the data level and processing, and determining potential mission collaborations. [33] E. Masuoka and M. Teague, “Science investigatorled global processing for the modis instrument,” Geoscience and Remote Sensing Symposium, 2001. IGARSS ’01. IEEE 2001 International, vol. 1, pp. 384–386 vol.1, 2001. [34] M. Esfandiari, H. Ramapriyan, J. Behnke, and E. Sofinowski, “Earth observing system (eos) data and information system (eosdis) — evolution update and future,” Geoscience and Remote Sensing Symposium, 2007. IGARSS 2007. IEEE International, pp. 4005–4008, July 2007. [35] D. McAdam, “The evolving role of tape in the data center,” The Clipper Group: Explorer, December 2006. [36] “Sun microsystems announces first one terabyte tape storage http://www.sun.com/aboutsun/pr/200807/sunflash.20080714.2.xml, July 2008. website.” world’s drive.” Lindley C. Graham is currently a junior at the Massachusetts Institute of Technology, where she is working towards a B.S. in Aerospace Engineering. She spent last summer working at JPL on the data handling system for the HyspIRI mission, focusing on developing a data storage and distribution strat- [37] “Panasas — welcome.” http://www.panasas.com/. [38] R. Domikis, J. Douglas, and L. Bisson, “Impacts of data format variability on environmental visual analysis systems.” http://ams.confex.com/ams/pdfpapers/119728.pdf. egy. [39] “Why did nasa choose hdf-eos as the format for data products from the earth observing system (eos) instruments?.” http://hdfeos.net/reference/Info docs/SESDA docs/NASA chooses HDFEOS.php, July 2001. [40] R. E. Ullman, “Status and plans for hdfeos, nasa’s format for eos standard products.” http://www.hdfeos.net/hdfeos status/ HDFEOSStatus.htm, July 2001. [41] “Hdf esdis project.” http://hdf.ncsa.uiuc.edu/projects/esdis/index.html, August 2007. 20 David M. Racek is a senior working toward a B.S. in Computer Engineering at Montana State University. He works in the Montana State Space Science and Engineering Laboratory where he specializes in particle detector instruments and circuits. He spent last summer working at JPL on compression algorithms for the HyspIRI mission. advanced scientific software for Earth and space science modeling with an emphasis on high performance computing and finite element adaptive methods. Additionally he is leading efforts in development of smart payload instrument concepts. He has given 32 national and international keynote/invited talks; published in numerous journals, conference proceedings, and book chapters. He is a member of the editorial board of the journal Scientific Programming, the IEEE Technical Committee on Scalable Computing, a Senior Member of IEEE, recipient of the JPL Lew Allen Award, and a NASA Exceptional Service Medal. Tony S. Tao is currently a junior honor student at the Pennsylvania State University, working towards a B.S. in Aerospace Engineering and a Space Systems Engineering Certification. Tony works in the PSU Student Space Programs Laboratory as the project manager of the OSIRIS Cube Satellite and as a systems engineer on the NittanySat nanosatellite, both of which aim to study the ionosphere. During his work at JPL in the summer of 2008, Tony worked on the communication and broadcast system of the HyspIRI satellite as well as a prototype Google Earth module for science product distribution. Christianna E. Taylor received her B.S. from Boston University in 2005 and her M.S. at Georgia Institute of Technology in 2008. She is currently pursing her PhD at the Georgia Institute of Technology and plans to pursue her MBA and Public Policy Certificate in the near future. She worked on the ground station selection for the HyspIRI mission during the summer of 2008 and looks forward to working at JPL in the coming year as a NASA GSRP fellow. Hannah R. Goldberg received her M.S.E.E and B.S.E from the Department of Electrical Engineering and Computer Science at the University of Michigan in 2004 and 2003 respectively. She has been employed at the Jet Propulsion Laboratory, California Institute of Technology since 2004 as a member of the technical staff in the Precision Motion Control and Celestial Sensors group. Her research interests include the development of nano-class spacecraft and microsystems. Charles D. Norton is a Principal Member of Technical Staff at the Jet Propulsion Laboratory, California Institute of Technology. He received his Ph.D in Computer Science from Rensselaer and his B.S.E in Electrical Engineering and Computer Science from Princeton University. Prior to joining JPL he was a National Research Council resident scientist. His work covers 21