SMOV3b Lessons Learned

advertisement
SMOV3b Lessons Learned
Sep. 2002
Table of Contents
Lessons
Lessons
Lessons
Lessons
1- 5
6– 7
8 – 13
14 – 16
SI Operations
Flight Ops
Data Processing
SMOV Planning
Ground Testing
Lesson 1
Issue: MAMA shutdown due to low Global Count Rate Setting
Originator: G. De Marchi (ACS group)
Priority:
1 – Required
Lesson Learned: During the MAMA Fold Analysis test, the SBC shut down
when a global count rate limit was violated. While this was actually not a
problem for the detector, it is undesirable for such shutdowns to occur. The
incident occurred because a neutral density filter was changed during one of
the latter ACS ground tests and the resulting illumination level changes were
not propagated to all of the necessary proposals. The MAMA Fold Tests uses
particularly low threshold limits as it disables a sequence of MAMA anode
wires in sequence. Ideally, the Fold Analysis Proposal would have been rerun in a thermal vacuum tests against the real detector.
Recommendation: Ensure that all ground test data and results are crosschecked against SMOV proposals. Ideally, all special commanding proposals
should be re-run against hardware in the appropriate environment whenever
a relevant change is made to the instrument.
Implementation Plan: The STScI Proposal Implementation Team (PIT) will
be augmented with representation from the SMGT and the Instrument Test
Team to assure that SMOV programs are consistent with instrument test
history and that appropriate testing is done if they are not.
Actionee: STScI SMOV Lead
Lesson 2
Issue: NICMOS Cool down rate slower than expected
Originator: L. Bassford
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Safing NICMOS aided in cooldown rate; expect that any
future cooldowns will take longer than initially predicted and will depend on
the length of time the NICMOS Cooling System is off.
Recommendation: An accurate, predictive thermal model for the NCS and
NICMOS system should be developed for the expected thermal environment
and compressor speeds. From this model, Project decisions can be made
regarding the safing of NICMOS to aid in cooldown.
Implementation Plan: Thermal Group (TCS) will develop the model.
Actionee: TCS (C. Cottingham)
Lesson 3
Issue: ACS Memory Dumps
Originator: L. Bassford
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Closing HSTPS/SDH ground collection sessions too early
can cause difficulty for Payload FSW verification.
Recommendation: Either close collection between files or after all
instrument memory dump files have been completed and dumped to SSR
Implementation Plan: Better synchronize SMOV plan observation with
ground collection procedures, such that the G/S collection stop is not issued
until everything is collected (instead of segments as with ACS.)
Actionee: SI SE Lynn Bassford and STScI PIT team.
Lesson 4
Issue: Expected status buffer (STB) messages or Out of Limits
Originator: L. Bassford
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Sometimes proposals intentionally command SIs resulting
in known/expected STB messages, causing alarm/worry of Ops and STScI
personnel (and the community reading the morning reports on the web) who
are unfamiliar with specific ops of an SI. (Example: ACS event involving
change in direction, due to mechanical backlash, yielded JERRCNT &
JHFCSSTA OOL flags with STB messages. This was a known ACS feature
which many interpreted as an anomaly)
Recommendation: Add notations and/or specific STB message information
to the SMOV plan whenever it is expected that an SI move or feature will
result in posting a STB message (or out of limit flag.) Also, for ACS
specifically, a COP has been updated to include more info on this feature.
Implementation Plan: The situation will be identified by the Proposal
Implementation Team (PIT) which will ensure that a comment is included in
the SMS delivery.
Actionee: STScI PIT
Lesson 5
Issue: NICMOS MEBs 1 & 2 and Rad temps OOL high
Originator: L. Bassford
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: There is extra Aft Shroud heat conditions with addition of
ACS and the ESM as well as scheduled high sun angles. Expect that temps
may reach safing limits by fall/winter warm period.
Recommendation: Raise on-board safing limits (with Ball’s and Part’s
concurrence).
Implementation Plan: Change implemented on-orbit on 2002/157 16:55:00
GMT with OR#16783.
Actionee: SI SE (Lynn Bassford)
Lesson 6
Issue: Ops Request (OR) generation
Originator: K. Walyus
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Ops Manager left with very little time for OR approval
process due to:
-timing of the SMOV schedule (very little time between determination
of data to be supplied to Ops Requests’ arguments and execution)
-multiple re-submittals of final OR submittals due to requested
revisions from STScI Team comments
Recommendation: Write OR ahead of time as a draft, don’t worry about
filling in specific data until final Operational version, all reviews should
review draft and make suggestions/comments, and therefore by the time a
final version is submitted (after data determination has been completed) no
comments or revisions should be necessary. Merely check to make sure no
part of procedure has changed from draft form. Give approving NASA
Managers a heads up during final stages of draft form.
Implementation Plan: Update/clarify OP procedures for SMOV4.
Actionee: STScI SMOV Lead and Project SMOV Manager
Lesson 7
Issue: SMOV schedule didn’t account for the time needed by STScI
Commanding, FSW and/or SI SE Teams for real time implementation.
Originator: Art Rankin
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Adequate review time needs to be added when database or
flight software updates are being planned to ensure sufficient time for review.
The amount of review time required can be decreased by pre-mission testing
of any potential changes.
Confidence in possible real time updates could be reinforced by getting copies
of “sample” updates ahead of time. From the discussion, above, many of the
"quick" updates required testing on an off line string prior to their execution
on orbit adding additional preparation time and thus in effect even needing
more lead time than if they had been exercised as part of SMGTs such as the
system compatibility or individual SI SMGTs.
Recommendation: Determine pre-mission if any changes could be made as
a result of SMOV tests. Test the mechanism for making these changes premission. If feasible, run “sample” real time updates, which are part of SMOV
program, in an SMGT.
Implementation Plan: In the SMOV planning phase, identify all implied
realtime operations for data updates and determine appropriate schedule
intervals to accommodate them.
Actionee: SMOV Planning Team and PIT
Lesson 8
Issue: NICMOS FW Test
Originator: L. Bassford and K. Walyus
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Special Engineering Tests require ground testing.
Recommendation: The SMOV team should assess the need for any special
engineering tests during the preparation of the activity summaries and the
proposals, for tests that will be executed during the SMOV period. These tests
should be executed pre-mission.
Implementation Plan: The SMOV activity summaries, generated during the
planning phase, will include a section for identifying any needed special
engineering tests. Also, STScI Proposal Implementation Team (PIT) will be
augmented with representation from the SMGT and the Instrument Test Team
to help make such identifications. The STScI SMOV Lead and the Project
Systems Management Manager ensure that such tests are scheduled and
carried out.
Actionee: STScI SMOV Lead (Carl Biagetti), Project Systems Management
Manager (J. Gainsborough)
Lesson 9
Issue: Science data processing capacity issues.
Originator: Albert Holm
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Science data processing, ingest and distribution was halted
at about 2am (local) on Wednesday, May 15, due to system problems on the
ACDSDOPS cluster. CISD determined that very full and very fragmented
disks combined with the termination of a piece of operating system software
were the causes of the failure. Recovery from this failure and restoration of
the data were completed on May 18. A second crash on May 19 necessitated a
second system restore, but “in work” OPUS data files were not recoverable
(transient data files, not included in CISD backup process). The Archive
Hotseat was notified and kept users informed about our downtime. Since we
could not receive science data, OPUS staff notified PACOR/DCF personnel of
our situation and kept them informed of our status throughout the recovery
effort. OPUS staff also kept the NICMOS and ACS teams (expecting quicklook data) informed of our situation. After significant effort by CISD, our
systems were returned to a normal status by Monday morning, May 20. In
addition to processing current data through the recovered pipeline since May
20, OPUS staff identified over 1700 affected observations that required
reprocessing. For about 40% of the observations, it was necessary for
PACOR-A to retransmit the POD source files. OPUS and PACOR-A staff
completed this reprocessing effort on June 18.
A number of circumstances contributed to the disk failures and to the loss of
unarchived data. System loading and data storage capacity became a concern
early this year. New hardware was obtained to help alleviate the problem, but
no analysis existed to show whether the improvements would be sufficient to
handle the large volume ACS observations as well as the large number of
NICMOS observations in both the pre-archive and the OTFR pipelines.
Had the disk failures occurred when SMOV-critical observations were in the
pipeline, those SMOV programs could have been delayed.
Recommendation: It is recommended that systems engineers scope data
processing capacity requirements prior to the installation of new features,
such as OTFR, major calibration improvements, replacement science
instruments, etc. It is further recommended that appropriate hardware
capacity improvements be coordinated with implementation of these features.
Implementation Plan: Ray Kutina has been chartered to carry out an
analysis of the processing capacity needs for the near future, as well as for
after SM4 and after SHARE project enhancements are added to the OTFR
pipeline. Further implementation steps are to be determined.
Actionee: Ray Kutina
Lesson 10
Issue: Incompatible requirements for data processing.
Originator: Albert Holm
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: Proposal specifications for several ACS SMOV programs
resulted in directions to deliver products that were incompatible with the kind
of processing designated. Specifically in this case, proposals 8948, 9020, and
9022 used Repeat Obs specifications for internal observations not intended to
be processed through calibration. The front-end systems convert the Repeat
Obs construction into a requirement that the observations be associated.
Association is performed in the calibration routines that these observations
were required to skip. The result was the production of inconsistent products
that the archive system rejected.
Eventually about 500 ACS observations failed to be ingested because of this
inconsistency. The fix involved manually altering PMDB tables, reprocessing
the data, and manually altering some keywords in the products. The
reprocessing load may have contributed to the meltdown of the data
processing systems on May 15 and 19.
Recommendation: Proposal specifications used during SMOV should not
differ from the kinds of specifications used during Ground System Testing. By
testing in the same manner as will be used in flight, potential data processing
errors can be uncovered and resolved before SMOV.
Implementation Plan: The Proposal Implementation Team will identify
proposal/ground_system discrepancies during the proprosal processing
phase.
Actionee: PIT
Lesson 11
Issue: Reliability of processing and archiving systems
Originator: G. De Marchi (ACS group)
Priority:
1 – Required
Lesson Learned: The ACS is the first HST instrument to generate daily data
volume comparable to the downlink capabilities of the HST ground system.
During SMOV3B), archiving (DADS) and retrieval (Archive) were at times
seriously hindered and delayed by ingest and retrieval times.
The ACS group had planned for handling large data SMOV data volumes by
having a group “on-line” archive so that all SMOV data was always
available to the group. However, early access to ACS data, particularly for
the IDT, was sometimes compromised by the retrieval times. The ACS group
filled this gap by making most SMOV data directly available to the IDT.
Recommendation Consider making SMOV data directly available (e.g.,
ftp from OPUS) to the science instrument teams. This is particularly
important for SM4 and the WFC3, where three new detector systems, in
addition to ACS, will generate a considerable increase in the amount and
volume of the data that the ground system will have to downlink and ingest
every day in addition to those produced by the ACS. See also the
recommendation for Lesson Learned 9 for predicting data processing
requirements.
Implementation Plan: If deemed necessary, and to the extent feasible, plan
such special data transfers during the SMOV operations planning phase. See,
also, the implementation plan for L.L. 9 for ground system engineering in
preparation for servicing missions and L.L. 12 for instrument team local
storage.
Actionee: SMOV Operations Planning Team (including OPUS, and SI
teams)
Lesson 12
Issue: Pipeline processing logic
Originator: G. De Marchi (ACS group)
Priority:
2 – Highly Desired
Lesson Learned: During the early days of operation of the ACS it was noted
that the time required to process some calibration data (mostly internal) could
have been considerably reduced by adjustment of switches and keywords. The
baseline implementation of these switches was based on rules adopted by
STIS, without regard for processing time and data volume. These proved unworkable for ACS due to excessive data volume and processing times.
Recommendation: When the pipeline is designed and implemented for new,
high data volume instruments, additional care should be paid to ensure that
the software performs only essential operations
Implementation Plan: A representative of each science instrument team
provides the switch and keyword requirements to STScI DST.
Actionee: SI teams and DST
Lesson 13
Issue: Local archival facility
Originator: G. De Marchi (ACS group)
Priority:
4 – Already Implemented
Lesson Learned: In preparation for the SMOV3B phase, the ACS group set
up a large capacity storage area to host the raw and processed data files. The
OPUS branch agreed to deliver to this disc area, on a daily basis, all ACS
data received and processed before sending them to the Archive. This
arrangement allowed uninterrupted access to the ACS data all through the
SMOV period, regardless of the status of the Archive and proved particularly
useful for a timely processing and analysis of all images, including the early
release observations.
Recommendation: Future HST instruments may find it useful to set up a
local storage area, on a fast network, for efficient and prompt data processing
during SMOV.
Implementation Plan: SI teams to provide if deemed necessary.
Actionee: SI teams
Lesson 14
Issue: Finishing Up SMOV Proposals – items not being done--forgotten
Originator: Ron Pitts
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: To ensure that special commanding/ISQL, scheduling, and
special data handling requirements are completed correctly and that special
commanding executes as expected, proposals may need to have precise
comments in the proposal header or on exposure lines or have certain
optional parameters or special requirements specified in the proposal. The
actual work may be done months later, so the proposal
must be precisely written. In SMOV3B after a PIT meeting, a proposal would
sometimes be “almost” correct with one or two outstanding items. In such
cases, the PI/PC was
instructed to complete the one or two outstanding items and resubmit the
proposal, but not to have to return to the PIT. On a number of proposals,
other work, intervening vacations, etc. resulted in the final changes not being
made. This resulted in several proposals, for which special commanding /
scheduling changes had to be done at the last minute, when deadlines for
launch were near, or in the proposal not having optimal scheduling because a
critical comment concerning a scheduling / data handling requirement had
been left out. In short, certain problems, which were caught in the PIT
process, happened anyway and should not have occurred.
Recommendation: Either proposals need to be returned to the PIT unless
NO changes
are required, or an action list needs to be maintained by the PIT meeting
coordinator who writes up the meeting minutes. If the action item list option
is used, it should also contain completion dates by which the actions are to be
completed and it should be visible to the PIT .
Implementation Plan: PIT team to modify its procedures according to the
recommendation.
Actionee: STScI Proposal Implementation Team (PIT)
Lesson 15
Issue: Calibration module in SMGT
Originator: G. De Marchi (ACS group)
Priority:
1 – Required
Lesson Learned: In the course of SMOV3B a number of special modes were
employed to test ACS functionality that had not previously been tested on the
ground system. The primary reason is that these modes in general were not
implemented in the ACS pipeline and were run during SMOV to provide
“special-mode” baseline performance data. For example only certain
amplifier readout configurations are supported by the ACS ground system,
however, following analysis of a STIS readout noise problem after SM3B, it
was decided that baseline bias and readout noise data should be obtained for
each amplifier reading out through the whole CCD, in order that baseline
figures would be available for future diagnostics if required. Such special
mode observations were not carried in the SMGT, as this test is designed to
simulate normal daily science operations and validate all standard
engineering modes.
Recommendation: If required, add a special calibration/checkout module to
SMGT and test carefully all modes, including special “one-time only” modes
whose use can be conceived of during the SMOV phase. It may be that we
still allow these observations to fail gracefully in the pipeline.
Implementation Plan: Ground test requirements will be identified in the
detailed SMOV plan. SMGT group will assist PIT team in determining the
appropriate test.
Actionee: STScI SMOV Lead and STScI PIT supported by an SMGT contact
Lesson 16
Issue: The need to power down SI and PCS high-voltages as risk mitigation against NCS
cooldown leakage was identified very late in the planning process.
Originator: Biagetti
Priority:
1 – Required
2 – Highly Desired
3 – Nice to have
4 – Already Implemented
Lesson Learned: While the need to disable SI MAMA high-voltages as risk mitigation during
NICMOS cooldown was identified early in the planning process and incorporated into the
SMOV plan, the possible risk to FGS high-voltages was identified only a few weeks before
launch. The resulting decision to turn off FGS and FHST high voltages during the cooldown
caused major perturbations to the early SMOV plan, requiring late re- planning, manual
intervention in the timeline to implement the replan, and an Observatory PCS system that was
inhibited for several days.
Recommendation: Identify system interdependencies and impacts as early as possible in the
SMOV planning process.
Implementation Plan: Include system implications as normal part of SMOV planning phase.
Actionee: STScI/Project SMOV Planning Team
Download