The Challenge of Software Cost Estimation for Long-Term Information Preservation of Earth Science Data Bruce R. Barkstrom Asheville, NC Paula L. Sidell League City, TX Outline ● ● The Accountant's View of Earth Science Data Information Preservation ● ● Threats create contingent liabilities that are needed to prevent asset impairments The Role of Software Development in Earth Science Information Accounting Cost Modeling The necessity for economy in information preservation An interesting factor: the cost of control Basic Accounting Questions ● ● Key Question: What value should the government carry on its books for Earth science data in its archives? Accountant Answers: 1. Use “market value” - but Earth science data is a public good, without a market (except maybe high resolution imagery) 2. Use “cost of reproduction” - but Earth science data is not “reproducible” - if you miss it, it's gone 3. Use “Net Present Value of stream of future benefits” but science data value likely not just economic An Accounting View of the Value of Earth Science Data ● Earth science data appears to fall into the category of “cultural, artistic, and scientific” asset ● Accounting treatment of assets ● Non-depreciable Public good Value arises from data use Launch vehicles, satellites, and instruments are “equipment costs” - “sunk expenses” at end of life Some “operating expenses” may be viewed as “insurance against value impairment” Direct cost of software to interpret data and for activities of cal and val are readily measured Earth Science Data Value represents the cost of the investment in making the data useful Data Uses for Societal Benefits ● ● USGEO (and related programs) have identified 9 societal benefit areas Data Uses usefully divided by time scale of required observation period (sample uses) USGEO Societal Benefit Area Disasters Human Health Immediate Use 1-2 Yr Use 4-25 Yr Use 25-100 Yr Use Doomsday Use Warning Energy Climate Water Power Plant Siting Mitigation and Adaptation Infrastructure Planning Weather Ecology Info Service Agriculture Oceans Precision Ag Service Fisheries Info Service Process Studies Biodiversity Crop Insurance The Accountant's View of Value ● ● ● ● Accountants View an Organization in terms of the Flow of Funds A Basic Chart of Accounts Debit Assets [A] not allowed in standard rules of accounting Fund Balance [F] Liabilities [L] Expenses [E] Funds are placed in a Chart of Accounts Frequent changes in the Chart of Accounts are Credit Income [I] Funding Balance Requires DA + DL = I – E Or Sum of Asset Changes plus Sum of Liability Changes equals Difference Between Income and Expenses Information Asset Valuation Activity Satellite Missions Instrument Dev. [T] Instrument Char. & Cal Facil. [T] Instrument Char. & Cal. [I] Instr.-Sat. Integration Sat.-Vehicle Integration Initial Sat./Instr. Checkout Sat./Instr. Ops. In Situ Data Networks In Situ Data Site Dev. [T] In Situ Data Site Ops. Science Data Production Science Algorithm Dev. [T] Science Data Validation [I] Science Data Production [T] Science Data Prod. Facility [T] Archival Activities Archive Ingest [I] Archive Curation [T] Archive Facilities [T] Archive Operations User Access Expected Interval Accounting Account [yr] Basis Type 5 5 0.2 0.5 0.5 0.2 LOM M F M M M M F DC DC O O O O O 2 LOM M F HS O LOD LOM LOM LOM F F F F SD O O HS LOM LOD LOD LOD F F F F O O HS O ● For Activity ● T: Activity produces a Tangible asset, such as hardware, software source code, or documentation I: Activity produces an Intangible asset, primarily information that may be used by researchers, decision makers, or other data users ● For Expected Interval ● ● ● ● LOM: Life Of Mission, which may be taken as ten years for a single satellite or instrument, and which may increase to more than a century for operational environmental observation capability LOD: Life Of Data, which NARA defines as 75 years beyond scientific research use. Here, we take this time period to be 200 years – give or take For Accounting Basis ● M: Modified Accrual accounting basis F: Full Accrual accounting basis ● For Account Type ● ● ● ● ● DC: Development and Construction O: Operations HS: Hardware and Software, including accounts for initial capitalization, depreciation, and refresh/upgrades SD: Specialized scientific software Development Practical Asset Valuation ● Three Standard Methods Acquisition Cost ● Replacement Cost ● No possible replacement of lost observations Net Present Value of Flow of Future Value ● ● Ultimate Residual Cost attributable to investment in calibration, data production, validation, and reprocessing Non-economic nature of data use of a public good makes method moot Common-Sense Approach Use of data must be timely, relevant, and reliable Value of data based on cost of expert interpretation as embedded in software and cal-val cost Expense Accounts ● Non-depreciable Asset subject to Asset Impairment ● Equivalent to buying insurance to reduce loss of value due to asset impairment Standard Approach to Insurance Cost 1. Identify threats 2. Estimate probability of loss and probable value of loss if threat materializes 3. Develop affordable strategy to mitigate risk The Challenges of Preservation ● Stringent Requirements Probability of Loss per Year, p 10.00% 1.00% 0.01% Probability of Survival to Year 201 6.4 X 10-10 0.13 0.98 ● Potential Loss Mechanisms Institutional instability and funding flow changes Operator errors Media, hardware, and software errors Loss of context, including software obsolescence IT security incidents with loss of user trust Evolution of hardware and software A Refined Balance Sheet Debit Assets Data Metadata Documentation Human Capital General Assets Buildings and Related Assets Credit Capitalization Liabilities IT Security Incidents Loss of Context Format and Software Obsolescence Expenses [offsets to avoid asset impairment] Income Replication (between agencies and governments) Operator Training and Monitoring Automation and Automation Testing Hardware Evolution Software Evolution Power and Air Conditioning Preservation Planning Administration The Role of Cost Modeling ● Three Eras in an Earth Science Data Collection: ● Science Software Development Production and Validation Archive Preservation Management Three Kinds of Cost Modeling Challenges In 1st phase: Initial Ignorance In 2nd phase: Unplannable and uncontrollable activities In 3rd phase: Level of effort with minimum cost Contingency Management ● Working Hypothesis: Unplannable and uncontrollable activities arise because of ● ● ● Initial ignorance (finite number of bugs) Changes in environment (long-term = Annual Change Traffic in classic COCOMO) Unplannable and uncontrollable activities create need for resource contingencies Approaches to Contingency Management Squeeze out unplannable and uncontrollable activities = Waterfall Lifecycle Reduce time in discovery-fix development-fix deployment = Agile Lifecycle The Cost of Control ● Attempting to “Squeeze Out” Uncertainty Incurs Control Costs ● Education (inculcate “right behavior” through training) Communication (staff meetings as “consensus builder”) Enforcement (hiring and firing costs) Choice of Lifecycle Rationale: balance control costs against benefits of approach Probable optimal strategies: ● ● ● Low likelihood of unplannable and uncontrollable activities – use Waterfall (concentrate on control) Moderate liklihood of unplannable and uncontrollable activities – use Spiral (balance control costs against contingency costs) High likelihood of unplannable and uncontrollable activities – use Agile (concentrate on rapid discovery and removal of contingencies)