SMOV3b Lessons Learned Sep. 2002 Table of Contents Lessons Lessons Lessons Lessons 1- 5 6– 7 8 – 13 14 – 16 SI Operations Flight Ops Data Processing SMOV Planning Ground Testing Lesson 1 Issue: MAMA shutdown due to low Global Count Rate Setting Originator: G. De Marchi (ACS group) Priority: 1 – Required Lesson Learned: During the MAMA Fold Analysis test, the SBC shut down when a global count rate limit was violated. While this was actually not a problem for the detector, it is undesirable for such shutdowns to occur. The incident occurred because a neutral density filter was changed during one of the latter ACS ground tests and the resulting illumination level changes were not propagated to all of the necessary proposals. The MAMA Fold Tests uses particularly low threshold limits as it disables a sequence of MAMA anode wires in sequence. Ideally, the Fold Analysis Proposal would have been rerun in a thermal vacuum tests against the real detector. Recommendation: Ensure that all ground test data and results are crosschecked against SMOV proposals. Ideally, all special commanding proposals should be re-run against hardware in the appropriate environment whenever a relevant change is made to the instrument. Implementation Plan: The STScI Proposal Implementation Team (PIT) will be augmented with representation from the SMGT and the Instrument Test Team to assure that SMOV programs are consistent with instrument test history and that appropriate testing is done if they are not. Actionee: STScI SMOV Lead Lesson 2 Issue: NICMOS Cool down rate slower than expected Originator: L. Bassford Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Safing NICMOS aided in cooldown rate; expect that any future cooldowns will take longer than initially predicted and will depend on the length of time the NICMOS Cooling System is off. Recommendation: An accurate, predictive thermal model for the NCS and NICMOS system should be developed for the expected thermal environment and compressor speeds. From this model, Project decisions can be made regarding the safing of NICMOS to aid in cooldown. Implementation Plan: Thermal Group (TCS) will develop the model. Actionee: TCS (C. Cottingham) Lesson 3 Issue: ACS Memory Dumps Originator: L. Bassford Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Closing HSTPS/SDH ground collection sessions too early can cause difficulty for Payload FSW verification. Recommendation: Either close collection between files or after all instrument memory dump files have been completed and dumped to SSR Implementation Plan: Better synchronize SMOV plan observation with ground collection procedures, such that the G/S collection stop is not issued until everything is collected (instead of segments as with ACS.) Actionee: SI SE Lynn Bassford and STScI PIT team. Lesson 4 Issue: Expected status buffer (STB) messages or Out of Limits Originator: L. Bassford Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Sometimes proposals intentionally command SIs resulting in known/expected STB messages, causing alarm/worry of Ops and STScI personnel (and the community reading the morning reports on the web) who are unfamiliar with specific ops of an SI. (Example: ACS event involving change in direction, due to mechanical backlash, yielded JERRCNT & JHFCSSTA OOL flags with STB messages. This was a known ACS feature which many interpreted as an anomaly) Recommendation: Add notations and/or specific STB message information to the SMOV plan whenever it is expected that an SI move or feature will result in posting a STB message (or out of limit flag.) Also, for ACS specifically, a COP has been updated to include more info on this feature. Implementation Plan: The situation will be identified by the Proposal Implementation Team (PIT) which will ensure that a comment is included in the SMS delivery. Actionee: STScI PIT Lesson 5 Issue: NICMOS MEBs 1 & 2 and Rad temps OOL high Originator: L. Bassford Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: There is extra Aft Shroud heat conditions with addition of ACS and the ESM as well as scheduled high sun angles. Expect that temps may reach safing limits by fall/winter warm period. Recommendation: Raise on-board safing limits (with Ball’s and Part’s concurrence). Implementation Plan: Change implemented on-orbit on 2002/157 16:55:00 GMT with OR#16783. Actionee: SI SE (Lynn Bassford) Lesson 6 Issue: Ops Request (OR) generation Originator: K. Walyus Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Ops Manager left with very little time for OR approval process due to: -timing of the SMOV schedule (very little time between determination of data to be supplied to Ops Requests’ arguments and execution) -multiple re-submittals of final OR submittals due to requested revisions from STScI Team comments Recommendation: Write OR ahead of time as a draft, don’t worry about filling in specific data until final Operational version, all reviews should review draft and make suggestions/comments, and therefore by the time a final version is submitted (after data determination has been completed) no comments or revisions should be necessary. Merely check to make sure no part of procedure has changed from draft form. Give approving NASA Managers a heads up during final stages of draft form. Implementation Plan: Update/clarify OP procedures for SMOV4. Actionee: STScI SMOV Lead and Project SMOV Manager Lesson 7 Issue: SMOV schedule didn’t account for the time needed by STScI Commanding, FSW and/or SI SE Teams for real time implementation. Originator: Art Rankin Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Adequate review time needs to be added when database or flight software updates are being planned to ensure sufficient time for review. The amount of review time required can be decreased by pre-mission testing of any potential changes. Confidence in possible real time updates could be reinforced by getting copies of “sample” updates ahead of time. From the discussion, above, many of the "quick" updates required testing on an off line string prior to their execution on orbit adding additional preparation time and thus in effect even needing more lead time than if they had been exercised as part of SMGTs such as the system compatibility or individual SI SMGTs. Recommendation: Determine pre-mission if any changes could be made as a result of SMOV tests. Test the mechanism for making these changes premission. If feasible, run “sample” real time updates, which are part of SMOV program, in an SMGT. Implementation Plan: In the SMOV planning phase, identify all implied realtime operations for data updates and determine appropriate schedule intervals to accommodate them. Actionee: SMOV Planning Team and PIT Lesson 8 Issue: NICMOS FW Test Originator: L. Bassford and K. Walyus Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Special Engineering Tests require ground testing. Recommendation: The SMOV team should assess the need for any special engineering tests during the preparation of the activity summaries and the proposals, for tests that will be executed during the SMOV period. These tests should be executed pre-mission. Implementation Plan: The SMOV activity summaries, generated during the planning phase, will include a section for identifying any needed special engineering tests. Also, STScI Proposal Implementation Team (PIT) will be augmented with representation from the SMGT and the Instrument Test Team to help make such identifications. The STScI SMOV Lead and the Project Systems Management Manager ensure that such tests are scheduled and carried out. Actionee: STScI SMOV Lead (Carl Biagetti), Project Systems Management Manager (J. Gainsborough) Lesson 9 Issue: Science data processing capacity issues. Originator: Albert Holm Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Science data processing, ingest and distribution was halted at about 2am (local) on Wednesday, May 15, due to system problems on the ACDSDOPS cluster. CISD determined that very full and very fragmented disks combined with the termination of a piece of operating system software were the causes of the failure. Recovery from this failure and restoration of the data were completed on May 18. A second crash on May 19 necessitated a second system restore, but “in work” OPUS data files were not recoverable (transient data files, not included in CISD backup process). The Archive Hotseat was notified and kept users informed about our downtime. Since we could not receive science data, OPUS staff notified PACOR/DCF personnel of our situation and kept them informed of our status throughout the recovery effort. OPUS staff also kept the NICMOS and ACS teams (expecting quicklook data) informed of our situation. After significant effort by CISD, our systems were returned to a normal status by Monday morning, May 20. In addition to processing current data through the recovered pipeline since May 20, OPUS staff identified over 1700 affected observations that required reprocessing. For about 40% of the observations, it was necessary for PACOR-A to retransmit the POD source files. OPUS and PACOR-A staff completed this reprocessing effort on June 18. A number of circumstances contributed to the disk failures and to the loss of unarchived data. System loading and data storage capacity became a concern early this year. New hardware was obtained to help alleviate the problem, but no analysis existed to show whether the improvements would be sufficient to handle the large volume ACS observations as well as the large number of NICMOS observations in both the pre-archive and the OTFR pipelines. Had the disk failures occurred when SMOV-critical observations were in the pipeline, those SMOV programs could have been delayed. Recommendation: It is recommended that systems engineers scope data processing capacity requirements prior to the installation of new features, such as OTFR, major calibration improvements, replacement science instruments, etc. It is further recommended that appropriate hardware capacity improvements be coordinated with implementation of these features. Implementation Plan: Ray Kutina has been chartered to carry out an analysis of the processing capacity needs for the near future, as well as for after SM4 and after SHARE project enhancements are added to the OTFR pipeline. Further implementation steps are to be determined. Actionee: Ray Kutina Lesson 10 Issue: Incompatible requirements for data processing. Originator: Albert Holm Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: Proposal specifications for several ACS SMOV programs resulted in directions to deliver products that were incompatible with the kind of processing designated. Specifically in this case, proposals 8948, 9020, and 9022 used Repeat Obs specifications for internal observations not intended to be processed through calibration. The front-end systems convert the Repeat Obs construction into a requirement that the observations be associated. Association is performed in the calibration routines that these observations were required to skip. The result was the production of inconsistent products that the archive system rejected. Eventually about 500 ACS observations failed to be ingested because of this inconsistency. The fix involved manually altering PMDB tables, reprocessing the data, and manually altering some keywords in the products. The reprocessing load may have contributed to the meltdown of the data processing systems on May 15 and 19. Recommendation: Proposal specifications used during SMOV should not differ from the kinds of specifications used during Ground System Testing. By testing in the same manner as will be used in flight, potential data processing errors can be uncovered and resolved before SMOV. Implementation Plan: The Proposal Implementation Team will identify proposal/ground_system discrepancies during the proprosal processing phase. Actionee: PIT Lesson 11 Issue: Reliability of processing and archiving systems Originator: G. De Marchi (ACS group) Priority: 1 – Required Lesson Learned: The ACS is the first HST instrument to generate daily data volume comparable to the downlink capabilities of the HST ground system. During SMOV3B), archiving (DADS) and retrieval (Archive) were at times seriously hindered and delayed by ingest and retrieval times. The ACS group had planned for handling large data SMOV data volumes by having a group “on-line” archive so that all SMOV data was always available to the group. However, early access to ACS data, particularly for the IDT, was sometimes compromised by the retrieval times. The ACS group filled this gap by making most SMOV data directly available to the IDT. Recommendation Consider making SMOV data directly available (e.g., ftp from OPUS) to the science instrument teams. This is particularly important for SM4 and the WFC3, where three new detector systems, in addition to ACS, will generate a considerable increase in the amount and volume of the data that the ground system will have to downlink and ingest every day in addition to those produced by the ACS. See also the recommendation for Lesson Learned 9 for predicting data processing requirements. Implementation Plan: If deemed necessary, and to the extent feasible, plan such special data transfers during the SMOV operations planning phase. See, also, the implementation plan for L.L. 9 for ground system engineering in preparation for servicing missions and L.L. 12 for instrument team local storage. Actionee: SMOV Operations Planning Team (including OPUS, and SI teams) Lesson 12 Issue: Pipeline processing logic Originator: G. De Marchi (ACS group) Priority: 2 – Highly Desired Lesson Learned: During the early days of operation of the ACS it was noted that the time required to process some calibration data (mostly internal) could have been considerably reduced by adjustment of switches and keywords. The baseline implementation of these switches was based on rules adopted by STIS, without regard for processing time and data volume. These proved unworkable for ACS due to excessive data volume and processing times. Recommendation: When the pipeline is designed and implemented for new, high data volume instruments, additional care should be paid to ensure that the software performs only essential operations Implementation Plan: A representative of each science instrument team provides the switch and keyword requirements to STScI DST. Actionee: SI teams and DST Lesson 13 Issue: Local archival facility Originator: G. De Marchi (ACS group) Priority: 4 – Already Implemented Lesson Learned: In preparation for the SMOV3B phase, the ACS group set up a large capacity storage area to host the raw and processed data files. The OPUS branch agreed to deliver to this disc area, on a daily basis, all ACS data received and processed before sending them to the Archive. This arrangement allowed uninterrupted access to the ACS data all through the SMOV period, regardless of the status of the Archive and proved particularly useful for a timely processing and analysis of all images, including the early release observations. Recommendation: Future HST instruments may find it useful to set up a local storage area, on a fast network, for efficient and prompt data processing during SMOV. Implementation Plan: SI teams to provide if deemed necessary. Actionee: SI teams Lesson 14 Issue: Finishing Up SMOV Proposals – items not being done--forgotten Originator: Ron Pitts Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: To ensure that special commanding/ISQL, scheduling, and special data handling requirements are completed correctly and that special commanding executes as expected, proposals may need to have precise comments in the proposal header or on exposure lines or have certain optional parameters or special requirements specified in the proposal. The actual work may be done months later, so the proposal must be precisely written. In SMOV3B after a PIT meeting, a proposal would sometimes be “almost” correct with one or two outstanding items. In such cases, the PI/PC was instructed to complete the one or two outstanding items and resubmit the proposal, but not to have to return to the PIT. On a number of proposals, other work, intervening vacations, etc. resulted in the final changes not being made. This resulted in several proposals, for which special commanding / scheduling changes had to be done at the last minute, when deadlines for launch were near, or in the proposal not having optimal scheduling because a critical comment concerning a scheduling / data handling requirement had been left out. In short, certain problems, which were caught in the PIT process, happened anyway and should not have occurred. Recommendation: Either proposals need to be returned to the PIT unless NO changes are required, or an action list needs to be maintained by the PIT meeting coordinator who writes up the meeting minutes. If the action item list option is used, it should also contain completion dates by which the actions are to be completed and it should be visible to the PIT . Implementation Plan: PIT team to modify its procedures according to the recommendation. Actionee: STScI Proposal Implementation Team (PIT) Lesson 15 Issue: Calibration module in SMGT Originator: G. De Marchi (ACS group) Priority: 1 – Required Lesson Learned: In the course of SMOV3B a number of special modes were employed to test ACS functionality that had not previously been tested on the ground system. The primary reason is that these modes in general were not implemented in the ACS pipeline and were run during SMOV to provide “special-mode” baseline performance data. For example only certain amplifier readout configurations are supported by the ACS ground system, however, following analysis of a STIS readout noise problem after SM3B, it was decided that baseline bias and readout noise data should be obtained for each amplifier reading out through the whole CCD, in order that baseline figures would be available for future diagnostics if required. Such special mode observations were not carried in the SMGT, as this test is designed to simulate normal daily science operations and validate all standard engineering modes. Recommendation: If required, add a special calibration/checkout module to SMGT and test carefully all modes, including special “one-time only” modes whose use can be conceived of during the SMOV phase. It may be that we still allow these observations to fail gracefully in the pipeline. Implementation Plan: Ground test requirements will be identified in the detailed SMOV plan. SMGT group will assist PIT team in determining the appropriate test. Actionee: STScI SMOV Lead and STScI PIT supported by an SMGT contact Lesson 16 Issue: The need to power down SI and PCS high-voltages as risk mitigation against NCS cooldown leakage was identified very late in the planning process. Originator: Biagetti Priority: 1 – Required 2 – Highly Desired 3 – Nice to have 4 – Already Implemented Lesson Learned: While the need to disable SI MAMA high-voltages as risk mitigation during NICMOS cooldown was identified early in the planning process and incorporated into the SMOV plan, the possible risk to FGS high-voltages was identified only a few weeks before launch. The resulting decision to turn off FGS and FHST high voltages during the cooldown caused major perturbations to the early SMOV plan, requiring late re- planning, manual intervention in the timeline to implement the replan, and an Observatory PCS system that was inhibited for several days. Recommendation: Identify system interdependencies and impacts as early as possible in the SMOV planning process. Implementation Plan: Include system implications as normal part of SMOV planning phase. Actionee: STScI/Project SMOV Planning Team