CHAPTER 1: INTRODUCTION Contractors use a variety of techniques to develop estimates for proposals submitted to the Government. Some of the more frequently employed techniques include methods such as analogous, bottoms-up, and parametric estimating. A primary responsibility of a project cost estimator is to select an estimating methodology that most reliably estimates program costs, while making the most economical use of the organization's estimating resources. In certain circumstances, parametric estimating techniques can provide reliable estimates that are generated at a lower cost and shorter cycle time than other traditional estimating techniques. Over the last several years, both Industry and Government have focused on ways to maximize the use of historical data in the estimating process. During this same time frame, both are trying to find ways to reduce the costs associated with proposal preparation, evaluation, and negotiation. Industry and Government representatives have offered parametric estimating as a technique that, when implemented and used correctly, can produce reliable estimates for proposals submitted to the Government, at significantly reduced cost and cycle time. Data issues seem to be the greatest concern regarding the use of parametric estimating methods, particularly in regard to the Truth in Negotiations Act (TINA). TINA requires that cost or pricing data be certified as current, accurate, and complete as of the date of negotiation or another agreed to date as close as practicable to the date of negotiation. TINA requires contractors to provide the Government with all the facts available at the time of certification, or an otherwise agreed to date. Properly calibrated and validated parametric techniques comply with the requirements of TINA. Parametric estimating is a technique that uses validated relationships between a project's known technical, programmatic, and cost characteristics and known historical resources consumed during the development, manufacture, and/or modification of an end item. A number of parametric techniques exist that practitioners can use to estimate costs. These techniques include cost estimating relationships (CERs) and parametric models. For the purpose of this handbook, CERs are defined1 as mathematical expressions or formulas that are used to estimate the cost of an item or activity as a function of one or more relevant independent variables, also known as cost driver(s). Generally, companies use CERs to estimate costs associated with low-dollar items. Typically, estimating these items using conventional techniques is time and cost intensive. For example, companies often use CERs to estimate costs associated with manufacturing support, travel, publications, or low-dollar material. Companies with significant proposal activity have negotiated Forward Pricing Rate Agreements (FPRAs)2 for certain CERs. Use of advance agreements such as FPRAs further streamlines the acquisition process and helps reduce costs. Chapter 3, Cost Estimating Relationships, provides examples, guidance, and best practices for implementing these techniques. Chapter 7, Regulatory Compliance, more fully discusses the use of FPRAs and other advance agreements. Parametric models are more complex than CERs because they incorporate many equations, ground rules, assumptions, logic, and variables that describe and define the particular situation being studied and estimated. Parametric models make extensive use of databases by cataloging program technical and cost history. Parametric models can be developed internally by an organization for unique estimating needs, or they can be obtained commercially. Typically, the databases are proprietary to the contractor or vendor; however, a vendor will likely share a description of the data in the database in order to build confidence with their users. Parametric models can be used to discretely estimate certain cost elements (e.g., labor hours for software development, software sizes such as lines of code (LOC) or function points), or they can be used to develop estimates for hardware (e.g., radar systems, space shuttle spare parts), and/or software systems (e.g., software for air traffic control systems). When implemented correctly and used appropriately, parametric models can be used as the primary basis of estimate (BOE). The following table identifies the chapters containing specific examples, guidance, and best practices related to the implementation of various types of parametric estimating methods: Chapter Model Type Chapter 3 Cost Estimating Relationships (CERs) Chapter 4 Company-Developed Models Chapter 5 Commercial Hardware Models Chapter 6 Commercial Software Models Parametric techniques have been accepted by Industry and Government organizations for many years, for use in a variety of applications. For example, many organizations have experienced parametricians on staff who regularly use parametrics to develop independent estimates (e.g., comparative estimates or rough order of magnitude estimates) and life cycle cost estimates (LCCEs). In addition, Industry and Government often use these techniques to perform trade studies such as cost as an independent variable (CAIV) analyses and design-tocost (DTC) analyses. Parametric techniques are also used to perform cost or price analyses. In fact, the Federal Acquisition Regulation (FAR) identifies parametrics as an acceptable price analysis technique in 15.404-1(b)(2)(iii). In addition, organizations used parametric estimating techniques to develop estimates, or as secondary methodologies to serve as "sanity checks" on the primary estimating methodology for proposals not requiring cost or pricing data. Chapter 11, Other Parametric Applications, provides an overview of these and other uses. Until recently, however, using parametric estimating techniques to develop estimates for proposals subject to cost or pricing data 3 was limited for a variety of reasons. These reasons included: Cultural resistance because many people in the acquisition community expressed greater comfort with the more traditional methods of estimating, including the associated BOE; and Limited availability of guidance on how to prepare, evaluate, and negotiate proposals based on parametric techniques. Nevertheless, many Industry and Government representatives recognized parametrics as a practical estimating technique that can produce credible cost or price estimates. By making broader use of these techniques they anticipated realizing some of the following benefits: Improvement in the quality of estimates due to focusing more heavily on the use of historical data, and establishing greater consistency in the estimating process; Streamlined data submission requirements decreasing the cost associated with preparing supporting rationale for proposals; Reduced proposal evaluation cost and cycle time; and Decreasing negotiation cost and cycle time through quicker proposal updates. After achieving some success with the broader uses of parametric techniques (e.g., independent estimates, trade studies), Industry saw the need to team with the Government to demonstrate that parametrics are an acceptable and reliable estimating technique. In December 1995, the Commander of the Defense Contract Management Command (DCMC) and the Director of the Defense Contract Audit Agency (DCAA) sponsored the Parametric Estimating Reinvention Laboratory. The purpose of the Reinvention Laboratory was to test the use of parametric estimating techniques on proposals and recommend processes to enable others to implement these techniques. The primary objectives of the Reinvention Laboratory included: Identifying opportunities for using parametric techniques; Testing parametric techniques on actual proposals submitted to the Government; Developing case studies based on the best practices and lessons learned; and Establishing formal guidance to be used by future teams involved in implementing, evaluating, and/or negotiating parametrically based estimating systems or proposals. Thirteen Reinvention Laboratory teams (as referenced in the Preface) tested and/or implemented the full spectrum of parametric techniques. The Industry and Government teams used these techniques to develop estimates for a variety of proposals, including those for new development, engineering change orders, and follow-on production efforts. The estimates covered the range of use from specific elements of cost to major-assembly costs. The teams generally found that using parametric techniques facilitated rapid development of more reliable estimates while establishing a sound basis for estimating and negotiation. In addition, the teams reported proposal preparation, evaluation, and negotiation cost savings of up to 80 percent; and reduced cycle time of up to 80 percent. The contractors, with feedback from their Government team members, updated or revised their estimating system policies and procedures to ensure consistent production of valid data and maintenance of the tools employed. The Reinvention Laboratory Closure Report 4 provides details on the best practices for implementing parametric techniques and is included in this edition of the handbook as Appendix F. The lab results have also been integrated throughout this handbook in the form of examples, best practices, and lessons learned with respect to implementing, evaluating, and negotiating proposals based on parametric techniques. As an example, one of the overarching best practices demonstrated by the Reinvention Laboratory is that parametric techniques (including CERs and models) should be implemented as part of a company’s estimating system. In a 1997 report entitled "Defense Contract Management" (report number GAO/HR-97-4), the General Accounting Office (GAO) stated that "contractor cost estimating systems are a key safeguard for obtaining fair and reasonable contract prices when market forces do not provide for such determinations." The DOD estimating system requirements are set forth in the Department of Defense FAR Supplement (DFARS), 215.811-70. As shown in Figure 1-1, a parametric estimating system includes: Data from which the estimate is based (i.e., historical data to the maximum extent possible); Guidance and controls to ensure a consistent and predictable system operation; Procedures to enforce the consistency of system usage between calibration and forward estimating processes; and Experienced/trained personnel. Chapter 7, Regulatory Compliance, and Chapter 9, Auditing Parametrics, provide detailed discussions on evaluating parametric estimating system requirements. Once a parametric estimating system has been effectively implemented, use of these techniques on proposals can result in significantly reduced proposal development, evaluation, and negotiation costs, and associated cycle time reductions. Figure 1-1: Parametric Estimating System Elements The results of the Reinvention Laboratory also demonstrated that the use of integrated product teams (IPTs) is a Best Practice for implementing, evaluating, and negotiating new parametric techniques. Generally, each IPT included representatives from the contractor’s organization as well as representatives from the contractor’s major buying activities, DCMC, and DCAA. Using an IPT process, team members provided their feedback on a real-time basis on issues such as the calibration and validation processes, estimating system disclosure requirements, and Government evaluation criteria. Detailed Government evaluation criteria are included in this Second Edition of the Parametric Estimating Handbook in Chapter 9, Auditing Parametrics, and Chapter 10, Technical Evaluation of Parametrics. By using an IPT process, contractors were able to address the concerns of Government representatives up-front, before incurring significant costs associated with implementing an acceptable parametric estimating system or in developing proposals based on appropriate techniques. The Reinvention Laboratory also showed that when key customers participated with the IPT from the beginning, the collaboration greatly facilitated their ability to negotiate fair and reasonable prices for proposals based on parametric techniques. The use of parametric estimating techniques as a BOE for proposals submitted to the Government is expected to increase over the coming years for many reasons, including: The results of the Reinvention Laboratory demonstrated that parametric estimating is a tool that can significantly streamline the processes associated with developing, evaluating, and negotiating proposals based on cost or pricing data. Parametric estimating techniques can be used as a basis of estimate for proposals based on information other than cost or pricing data; thereby increasing the reusability of this estimating technique. Parametric estimating is referenced in the FAR. FAR 15.404-1 (c) (2) (i) (C) states that the Government may use various cost analysis techniques and procedures to ensure a fair and reasonable price, including verifying reasonableness of estimates generated by appropriately calibrated and validated parametric models or CERs. The detailed guidance, case studies, and best practices contained in this handbook provides an understanding of the "how-tos" for parametric estimating. This handbook should help all those involved in the acquisition process overcome barriers related to their lack of familiarity with parametrics. However, it is also recognized that outside IPT training may be needed on implementation, maintenance, evaluation, and negotiation techniques for parametric-based estimating systems and proposals. Appendix E, Listing of Professional Societies/Web Sites/Educational Institutions, provides additional sources where information on parametrics (including available training courses) can be obtained. This Handbook is designed to provide greater familiarization with parametric estimating techniques, guidance on acceptable use of this estimating methodology, and methods for evaluation. The organization of this Second Edition mirrors the process used in developing a parametric estimating capability. Chapter 2 discusses data collection and normalization. Chapters 3 through 6 discuss various parametric modeling techniques ranging from CER development to more robust models, both proprietary and commercial. Chapter 7 addresses regulatory issues, while Chapters 8 through 10 discuss the roles of various Government organizations and functional specialists. The Handbook concludes with Chapter 11, which discusses other uses of parametric estimating techniques. The Appendices provide supplementary information including a glossary of terms, a listing of web site resources, and other helpful information. 1See Chapter 3 for complete definitions of the parametric terminology used in this handbook. An FPRA is a structured agreement between the Government and a contractor to make certain rates, factors, and estimating relationships available for pricing activities during a specified period of time. See Chapter 7 for additional information on this topic. 3 In accordance with the FAR and accepted usage, the phrase "cost or pricing data," when used in this specific combination, refers to data that is/will be subject to certification under TINA. 4 The Reinvention Laboratory Closure Report is an executive summary, which discusses the criteria a company should use to determine if parametrics would be beneficial, and best practices for implementing these techniques. 2 CHAPTER 2: DATA COLLECTION AND ANALYSIS Chapter Summary All parametric estimating techniques, including cost estimating relationships (CERs) and complex models, require credible data before they can be used effectively. This chapter provides an overview of the processes needed to collect and analyze data to be used in parametric applications. The chapter also discusses data types, data sources, and data adjustment techniques, including normalization. Objective/Purpose Discuss the importance of collecting historical cost and noncost (i.e., technical) data to support parametric estimating techniques. Identify various sources of information that can be collected to support data analysis activities. Describe the various methods of adjusting raw data so it is common (i.e., data normalization). I. Generalizations Parametric techniques require the collection of historical cost data (including labor hours) and technical noncost data. Data should be collected and maintained in a manner that provides a complete audit trail with expenditure dates so that dollar valued costs can be adjusted for inflation. While there are many universal formats for collecting data, an example of one commonly used by Industry is the Work Breakdown Structure (WBS). The WBS provides for uniform definitions and collection of cost and technical information. It is discussed in detail in MIL-HDBK-881, DOD Handbook – WBS (Appendix B contains additional information). Other data collection formats may follow the process cost models of an activity based costing (ABC) system. Regardless of the method, a contractor’s data collection practices should be consistent with the processes used in estimating, budgeting and executing the projects on which the data was collected. If this is not the case, the data collection practices should contain procedures for mapping the costs used in the database to the specific model elements. The collecting point for cost data is generally the company’s management information system (MIS), which in most instances contains the general ledger and other accounting data. All cost data used in parametric techniques must be consistent with, and traceable back to, the original collecting point. The data should also be consistent with the company’s accounting procedures and cost accounting standards. Technical noncost data describes the physical, performance, and engineering characteristics of a system, sub-system or individual item. For example, weight is a common noncost variable used in CERs and parametric estimating models. Other typical examples of cost driver variables include horsepower, watts, thrust, and lines of code. A fundamental requirement for the inclusion of a noncost variable in a CER would be that it is a significant predictor of cost (i.e., a primary cost driver). Technical noncost data comes from a variety of sources including the MIS (e.g., materials requirements planning (MRP) or enterprise resource planning (ERP) Systems), engineering drawings, engineering specifications, certification documents, interviews with technical personnel, and through direct experience (i.e., weighing an item). Schedule, quantity, equivalent units, and similar information comes from Industrial Engineering, Operations Departments, program files or other program intelligence. Once collected, data need to be adjusted for items such as production rate, improvement curve, and inflation. This is also referred to as the data normalization process. Relevant program data including development and production schedules, quantities produced, production rates, equivalent units, breaks in production, significant design changes, and anomalies such as strikes, explosions, and other natural disasters are also necessary to fully explain any significant fluctuations in the historical data. Such historical information can generally be obtained through interviews with knowledgeable program personnel or through examination of program records. Any fluctuations may exhibit themselves in a profile of monthly cost accounting data. For example, labor hour charging may show an unusual "spike" or "depression" in the level of charged hours. Data analysis and normalization processes are described in further detail later in the chapter. First, it is important to identify data sources. II. Data Sources Specifying an estimating methodology is an important early step in the estimating process. The basic estimating methodologies (analogy, catalog prices, extrapolation, grassroots, and parametric) are all data-driven. To use any of these methodologies, credible and timely data inputs are required. If data required for a specific approach are not available, then that estimating methodology cannot be used effectively. Because of this, it is critical that the estimator identifies the best data sources. Figure 2-1 shows nine basic sources of data and whether the data are considered a primary or secondary source of information. When preparing a cost estimate, estimators should consider all credible data sources. However, primary sources of data should be given the highest priority for use whenever feasible. The table below uses the following definitions of primary and secondary sources of data for classification purposes: Primary data are obtained from the original source. Primary data are considered the best in quality, and ultimately the most reliable. Secondary data are derived (possibly "sanitized") from primary data, and are therefore, not obtained directly from the source. Because secondary data are derived (actually changed) from the original data, it may be of lower overall quality and usefulness. Sources of Data Source Basic Accounting Records Cost Reports Source Type Primary Either (Primary or Secondary) Historical Databases Either Functional Specialist Either Technical Databases Either Other Information Systems Either Contracts Secondary Cost Proposals Secondary Figure 2-1: Sources of Data Collecting the necessary data to produce an estimate, and evaluating the data for reasonableness, is a critical and often time-consuming step. As stated, it is important to obtain cost information, technical information, and schedule information. The technical and schedule characteristics of programs are important because they drive the cost. For example, assume the cost of another program is available and a program engineer has been asked to relate the cost of the program to that of some other program. If the engineer is not provided with specific technical and schedule information that defines the similar program, the engineer is not going to be able to accurately compare the programs, nor is he or she going to be able to respond to questions a cost estimator may have regarding the product being estimated in comparison to the historical data. The bottom line is that the cost analysts and estimators are not solely concerned with cost data. They need to have technical and schedule information available in order to adjust, interpret, and lend credence to the cost data being used for estimating purposes. A cost estimator has to know the standard sources where historical cost data exists. This knowledge comes from experience and from those people that are available to answer key questions. A cost analyst or estimator should constantly search out new sources of data. A new source might keep cost and technical data on some item of importance to the current estimate. Internal contractor information may also include analyses such as private corporate inflation studies, or "market basket" analyses. Market basket analysis is an examination of the price changes in a specified group of products. Such information provides data specific to a company's product line(s) that could be relevant to a generic segment of the economy as a whole. Such specific analyses would normally be prepared as part of an exercise to benchmark Government provided indices, such as the consumer price index, and to compare corporate performance to broader standards. In addition, some sources of data may be external. External data can include databases containing pooled and normalized information from a variety of sources (other companies or public record information). Although such information can often be useful, weaknesses of these sources can relate to: No knowledge of the manufacturing and/or software processes used and how they compare to the current scenario being estimated. No knowledge of the procedures (i.e., accounting) used by the other contributors. No knowledge on the treatment of anomalies (how they were handled) in the original data. The inability to accurately forecast future indices. It is important to realize that sources of data are almost unlimited, and all relevant information should be considered during data analysis, if practical. Although major sources are described above, data sources should not be constrained to a specific list. Figure 2-2 highlights several key points about data collection, evaluation, and normalization processes. Data Collection, Evaluation and Normalization Very Critical Step Can Be Time-Consuming Need Actual Historical Cost, Schedule, and Technical Information Know Standard Sources Search Out New Sources Capture Historical Data Provide Sufficient Resources Figure 2-2: Data Collection, Evaluation & Normalization III. Routine Data Normalization Adjustments Data need to be adjusted for certain effects to make it homogeneous, or consistent. Developing a consistent data set is generally performed through the data analysis and normalization process. In nearly every data set, the analyst needs to examine the data to ensure the database is free of the effects of: The changing value of the dollar over time, The effects of cost improvement as the organization improves its efficiency, and The effects of various production rates during the period from which the data were collected. Figure 2-3 provides a process flow related to data normalization. This process description is not intended to be all-inclusive, but rather depicts the primary activities performed in normalizing a data set. Figure 2-3: Data Normalization Process Flow Some data adjustments are routine in nature and relate to items such as inflation. Routine data adjustments are discussed below. Other adjustments are more complex in nature, such as those relating to anomalies. Section IV discusses significant data normalization adjustments. A. Inflation Inflation is defined as a rise in the general level of prices, without a rise in output or productivity. There are no fixed ways to establish universal inflation indices (past, present or future) that fit all possible situations. Inflation indices generally include internal and external information as discussed in Section II. Examples of external information include the Consumer Price Index (CPI), Producer Price Index (PPI), and other forecasts of inflation from various econometric models. Therefore, while generalized inflation indices may be used, it may also be possible to tailor and negotiate indices used on an individual basis to specific labor rate agreements (e.g., forward pricing rates) and the actual materials used on the project. Inflation indices should be based on the cost of materials and labor on a unit basis (piece, pounds, hour), and should not include other considerations like changes in manpower loading or the amount of materials used per unit of production. The key to inflation adjustments is consistency. If cost is adjusted to a fixed reference date for calibration purposes, the same type of inflation index must be used in escalating the cost forward or backwards from the reference date and then to the date of the estimate. B. Cost Improvement Curve When first applied, cost improvement was referred to as "Learning Curve" theory. Learning curve theory states that as the quantity of a product produced doubles, the manufacturing hours per unit expended producing the product decreases by a constant percentage. The learning curve, as originally conceived, analyzes labor hours over successive production units of a manufactured item. The theory has been adapted to account for cost improvement across the organization. Both cost improvement and the traditional learning curve theory is defined by the following equation: = AXb; Y = hours/unit (or constant dollars per unit) A = first unit hours (or constant dollars per unit) X = unit number b = slope of the curve related to learning. Y Where: In parametric models, the learning curve is often used to analyze the direct cost of successively manufactured units. Direct cost equals the cost of both touch labor and direct materials in fixed year dollars. Sometimes this is called an improvement curve. The slope is calculated using hours or constant year dollars. A more detailed explanation of improvement curve theory is presented in Chapter 3 and Chapter 10. C. Production Rate The cost improvement curve theory has had many innovations since originally conceived. One of the more popular innovations has been the addition of a variable in the equation to capture the organization's production rate. The production rate is defined as the number of items produced over a given time period. The following equation modifies the general cost improvement formula to capture changes in the production rate (Qr) and organizational cost improvement (Xb): = AXbQr Y = hours/unit (or constant dollars per unit) A = first unit hours (or constant dollars per unit) X = unit number b = slope of the curve related to learning Q = production rate (quantity produced during the period) r = slope of the curve related to the production rate. Y Where: The net effect of adding the production rate effect equation (Q r) is to adjust the first unit dollars (A) for various production rates throughout the life of the production effort. The equation will also yield a rate-effected slope related to learning. The rate effected equation must be monitored for problems of multicollinearity (X and Q having a high degree of correlation). If the model exhibits problems of multicollinearity, the analyst should account for production rate effects using an alternative method. If possible, rate effects should be derived from historical data program behavior patterns observed as production rates change, while holding the learning slope coefficient constant. The rate effect can vary considerably depending on what was required to effect the change. For example, were new facilities required or did the change involve only a change in manpower or overtime? Chapter 10 provides additional information on data adjustments for inflation, learning, and production rate. IV. Significant Data Normalization Adjustments The section describes some of the more complex adjustments analysts make to historical cost data used in parametric analysis. A. Consistent Scope Adjustments are appropriate for differences in program or product scope between the historical data and the estimate being made. For example, suppose the systems engineering department made a comparison of five similar programs. After initial analysis, the organization realized that only two of the five had design-to-cost (DTC) requirements. To normalize the data, the DTC hours were deleted from the two programs to create a consistent systems scope. B. Anomalies Historical cost data should be adjusted for anomalies (unusual events) when it is not reasonable to expect these unusual costs to be present in the new projects. The adjustments and judgments used in preparing the historical data for analysis should be fully documented. For example, a comparison has been made to compare the development test program from five similar programs and then certain observations are made that one of the programs experienced a major test failure (e.g., qualification, ground test, flight test). A considerable amount of labor resources were required to fact-find, determine the root cause of the failure, and develop an action plan for a solution. Should the hours for this program be included in the database or not? This is an issue analysts must consider and resolve. If an adjustment is made to this data point, then the analyst must thoroughly document the actions taken to identify the anomalous hours. There are other changes for which data can be adjusted, such as changes in technology. These changes must be accounted for in a contractor’s estimate. Data normalization is one process typically used to make all such adjustments. In certain applications, particularly if a commercial model is used, the model inputs could be adjusted to account for certain improved technologies (see discussion of commercial models in Chapters 5 and 6). In addition, some contractors, instead of normalizing the data for technology changes, may deduct estimated savings from the bottom-line estimate. Any adjustments made by the analyst to account for a technology change in the data must be adequately documented and disclosed. For example, suppose electronic circuitry was originally designed with discrete components, but now the electronics are ASIC technology (more advanced technology). Or, a hardware enclosure once was made from aluminum and now is made, for weight constraints, of magnesium. What is the impact on the hours? Perfect historical data may not exist, but good judgment and analysis by an experienced analyst should supply reasonable results. For example, suppose there are four production lots of manufacturing hours data that look like the following: Lot Total Hours Units Average hours per unit Lot 1 256,000 300 853 hours/unit Lot 2 332,000 450 738 hours/unit Lot 3 361,760 380 952 hours/unit Lot 4 207,000 300 690 hours/unit Clearly, Lot 3's history should be investigated since the average hours per unit appear peculiar. It is not acceptable to merely "throw out" Lot 3 and work with the other three lots. A careful analysis should be performed on the data to determine why it exhibited this behavior. C. Illustration of Data Adjustment Analysis Based on the prior discussion, the following is an example to illustrate the data analysis process. Suppose the information in the following table represents a company’s historical data and that the prospective system is similar to one built several years ago. Parameter Historical System Prospective System Date of Fabrication Jul 89-Jun 91 Jul 95-Dec 95 Production Quantity 500 750 Size- Weight 22 lb. external case 5 lb. int. chassis 8lb. elec. parts 20 lb. external case 5 lb. int. chassis 10 lb. elec. parts Volume 1 cu ft-roughly cubical 12.l x 11.5 x 12.5 .75 cu ft-rec. solid 8 x 10 x 16.2 Other Prog Features 5% elec. Additional spare parts 5% elec. No spare parts This data needs several adjustments. In this example, the inflation factors, the quantity difference, the rate of production effect, and the added elements in the original program (the spare parts) would require adjustment. The analyst must be careful when normalizing the data. General inflation factors are usually not appropriate for most situations. Ideally, the analyst will have a good index of costs specific to the industry and will use labor cost adjustments specific to the company. The quantity and rate adjustments will have to consider the quantity effects on the company's vendors and the ratio of overhead and setup to the total production cost. Likewise, with rate factors each labor element will have to be examined to determine how strongly the rate affects labor costs. On the other hand, the physical parameters do not suggest that significant adjustments are required. The first order normalization of the historic data would consist of: Material escalation using Industry or company material cost history. Labor escalation using company history. Material quantity price breaks using company history. Possible production rate effects on touch labor (if any) and unit overhead costs. Because both cases are single lot batches, and are within a factor of two in quantity, only a small learning curve or production rate adjustment would generally be required. V. Evaluation Issues The Defense Federal Acquisition Regulation Supplement (DFARS) 215-407-5, "Estimating Systems," states that "contractors should use historical data whenever appropriate." The DFARS also states "a contractor’s estimating system should provide for the identification of source data and the estimating methods and rationale used to develop an estimate." Therefore, all data, including any adjustments made, should be thoroughly documented by a contractor so that a complete trail is available for verification purposes. Some key questions an evaluator may ask during their review of data collection and analysis processes include: Are sufficient data available to adequately develop parametric techniques? Has the contractor established a methodology to obtain, on a routine basis, relevant data on completed projects? Are cost, technical, and program data collected in a consistent format? Will data be accumulated in a manner that will be consistent with the contractor’s estimating practices? Are procedures established to identify and examine any data anomalies? Were the source data used as is, or did it require adjustment? Are any adjustments made to the data points adequately documented to demonstrate that they are logical, reasonable, and defensible? Chapter 9, Audit Issues, and Chapter 10, Technical Evaluation of Parametrics, provides additional information on Government evaluation criteria. VI. Other Considerations There are several other issues that need to be considered when performing data collection and analysis. Some of these are highlighted below. A. Resources Data collection and analysis activities require proper resources be established. Therefore, companies should establish sufficient resources to perform these activities. In addition, formal processes should be established describing data collection and analysis activities. Chapter 7, Regulatory Compliance provides information on estimating system requirements and includes discussion on data collection and analysis procedures. B. Information in the Wrong Format While the contractor may indeed possess a great deal of data, in many cases the data is not in an appropriate format to support the parametric techniques being used. For example, commercial parametric models may have a unique classification system for cost accounts that differ from those used by a company. As a result, companies using these models would have to develop a process that compares their accounting classifications to those used by the model (also known as "mapping"). In other situations, legacy systems may have generated data to meet the needs for reporting against organizational objectives, which did not directly translate into the needs of the cost estimating and analysis function. For example, the orientation of a large number of past and existing information systems may have focused on the input side with little or no provision for making meaningful translations reflecting output data useful in CER development or similar types of analysis. The growing use of ERP systems, which have a common enterprise-wide database, should make this data disconnect less severe. Most large organizations are implementing ERP systems or are reengineering their existing Information Systems so that parametric estimating models can be interfaced with these systems quite easily. C. Differences in Definitions of Categories Many problems occur when the analyst or the database fails to account for differences in the definitions of the WBS elements across the projects included in the database. Problems also occur when the definition of the content of cost categories fails to correspond to the definition of analogous categories in existing databases. For example, some analysts put engineering drawings into the data category while others put engineering drawings into the engineering category. A properly defined WBS product tree and dictionary can avoid or minimize these inconsistencies. D. The Influence of Temporal Factors Historical data are generated over time. This means that numerous dynamic factors will influence data being collected in certain areas. For example, the definition of the content of various cost categories being used to accumulate the historical data may change as a system evolves. Similarly, inflation changes will occur and be reflected in the cost data being collected over time. In addition, as the Department of Defense (DOD) deals with a rapidly changing technical environment, both cost and noncost data generated for a given era or class of technology are necessarily limited. Many analysts would consider a data-gathering project a success if they could obtain five to ten good data points for certain types of hardware. E. Comparability Problems Comparability problems include, but are not limited to, changes in a company's department numbers, accounting systems, and disclosure statements. They also include changes from indirect to direct charge personnel for a given function. When developing a database, the analyst must normalize it to ensure the data are comparable. For example, if building a database with cost data, the analyst must first remove the effects of inflation so that all costs are displayed in constant dollars. The analyst must also normalize the data for consistency in content. Normalizing for content is the process of ensuring that a particular cost category has the same definition in terms of content for all observations in the database. Normalizing cost data is a challenging problem, but it must be resolved if a good database is to be constructed. Resolving database problems so that an information system exists to meet user needs is not easy. For example, cost analysis methodologies typically vary considerably from one analysis or estimate to another. The requirements for CERs, such as the data and information requirements are not constant over time. An analysts data needs determination at one point is time is not the final determination for all time for that system. Data needs must be reviewed periodically. The routine maintenance and associated expense of updating the database must also be considered. An outdated database may be of very little use in forecasting future acquisition costs. The more the organization develops and relies on parametric estimating methods, the more it will need to invest in data collection and analysis activities. The contractor needs to balance this investment against the efficiency gains it plans to achieve through use of parametric estimating techniques. If the contractor moves towards an ERP system, the incremental cost to add a parametric estimating capability may not be significant. Good data underpins the quality of any estimating system or method. As the acquisition community moves toward estimating methods that increase their reliance on the historical costs of the contractor, the quality of the data cannot be taken for granted. Industry and their Government customers should find methods to establish credible databases that are relevant to the history of the contractor. From this, the contractor should be in a better position to reliably predict future costs, and the Government would be in a better position to evaluate proposals based on parametric techniques. CHAPTER 3: COST ESTIMATING RELATIONSHIPS (CERs) Chapter Summary Many companies implement cost estimating relationships (CERs) to streamline the costs and cycle time associated with proposal preparation, evaluation, and negotiation processes. Often CERs are used to price low-cost items or services that take a significant amount of resources to estimate using traditional techniques. Proper CER development and application depends heavily on understanding certain mathematical and statistical techniques. This chapter explains some of the easier and more widely used techniques. However, there are many other techniques available, which are explained in standard statistical textbooks (Appendix E contains a listing of statistical resources). The focus of the discussion in this chapter is designed to permit an analyst to understand and apply the commonly used techniques. In addition, the chapter provides "Rule-of-Thumb" guidelines for determining the merit of statistical regression models, instructions for comparing models, examples of simple and complex CERs developed and employed by some of the Parametric Estimating Reinvention Laboratory sites, and a discussion of the differences between simple and complex models. Objective/Purpose The primary objective of this chapter is to provide general guidance for use in developing and employing valid CERs. The chapter focuses on simple and complex CERs and provides information on implementation, maintenance, and evaluation techniques. Specifically, this chapter: 1. Discusses various techniques for implementing CERs, including the linear regression model of the Least Squares Best-Fit model. 2. Provides a framework for analyzing the quality or validity of a statistical model. 3. Recommends procedures for developing a broad based CER estimating capability. Key Assumptions A number of quantitative applications can be used to analyze the strength of data relationships. When applicable, statistical analysis is one of the most frequently used techniques. Therefore, this chapter focuses on the use of statistical analysis as a tool for evaluating the significance of data relationships. However, readers should be aware that other techniques are available. I. Developing CERs Before venturing into the details of how to develop CERs, an understanding of the definition is necessary. Numerous cost estimating references and statistical texts may define CERs in a number of different ways. In deciding on a standard set of definitions for this Handbook, the Parametric Cost Estimating Initiative (PCEI), Working Group (WG) solicited feedback from a number of sources. The PCEI WG decided upon the continuum represented in Figure 3-1, and described below Figure 3-1: Continuum of CER Complexity In short, CERs are mathematical expressions of varying degrees of complexity expressing cost as a function of one or more cost driving variables. The relationship may utilize cost-to-cost variables, such as manufacturing hours to quality assurance hours, or cost-to-noncost variables, such as engineering hours to the number of engineering drawings. The continuum of CERs is synonymous with the term parametric estimating methods. Parametric estimating methods are defined as estimating techniques that rely on theoretical, known or proven relationships between item characteristics and the associated item cost. Whether labeled a CER or a parametric estimating method, the technique relies on a value, called a parameter, to estimate the value of something else, typically cost. The estimating relationship can range in complexity from something rather simple; such as a numerical expression of value or a ratio (typically expressed as a percentage), to something more complex; such as a multi-variable mathematical expression. As the relationships increase in complexity, many analysts identify them as a cost model. A model is a series of equations, ground rules, assumptions, relationships, constants, and variables that describe and define the situation or condition being studied. If the model is developed and sold to the public for broad application, it is typically referred to as a commercial model. If the model is developed for the specific application needs of an organization, it is typically referred to as a company-developed or proprietary model. A. Definition of a CER As previously stated, a CER is a mathematical expression relating cost as the dependent variable to one or more independent cost-driving variables. An example of a cost-to-cost CER may be represented by using manufacturing costs to estimate quality assurance costs, or using manufacturing hours to estimate the cost for expendable material such as rivets, primer, or sealant. The key notion is that the cost of one element is used to estimate, or predict, the cost of another element. When the relationship is described as a cost-tononcost relationship, the reference is to a CER where a characteristic of an item is used to predict the item’s cost. An example of a cost-to-noncost CER may be to estimate manufacturing costs by using the weight of an item. Another example is to use the number of engineering drawings to estimate designengineering costs. In the cost-to-noncost examples, both weight and the number of engineering drawings are noncost variables. For CERs to be valid, they must be developed using sound logical concepts. The logic concept is one where experts in the field agree, as supported by generally accepted theory, that one of the variables in the relationship (the independent variable) causes or affects the behavior in another variable (the dependent variable). Once valid CERs have been developed, parametric cost modeling and estimating can proceed. This chapter discusses some of the more commonly used statistical techniques for CER development. B. CER Development Process CERs are a key tool used in estimating by the cost analyst and they may be used at any time in the estimating process. For example, CERs may be used in the concept or validation phase to estimate costs of a program when there is insufficient system definition. CERs may also be used in later phases to estimate program costs for use as a cross-check against estimates prepared using other techniques. CERs are also used as a basis of estimate (BOE) for proposals submitted to the Government or higher-tier contractors. Often before implementing complex CERs or models, analysts begin with more rudimentary CERs in order to gain the confidence of internal company and external Government representatives. The CER development process is illustrated in Figure 3-2. The beginning of the CER development process is the identification of an opportunity to improve the estimating process through the use of CERs. The specific outcome of this step is a whitepaper describing the specific opportunity, the data needs, the analysis tools, the CER acceptance criteria, and a planned process for keeping the CER current. In undertaking this effort, an organization will typically investigate a number of estimating relationships. Evaluating CER opportunities one at a time is rather inefficient when it comes to data collection. Therefore, the cost team will typically research the firm’s databases for data supporting a number of opportunities simultaneously. The value of a CER depends on the soundness of the database from which the CER is developed and subsequently, how it is used in future estimates. Determination of the "goodness" of a CER and its applicability to the system being estimated requires a thorough analysis of the system and knowledge of the database. It is possible, however, to make a few general observations about CER development. CERs are analytical equations that relate various cost categories (either in dollars or physical units) to cost drivers or explanatory variables. CERs can take numerous forms, ranging from informal rules-ofthumb or simple analogies to formal mathematical functions derived from statistical analysis of empirical data. Regardless of the degree of complexity, developing a CER requires a concerted effort to assemble and refine the data that constitutes its empirical basis. In deriving a CER, assembling a credible database is especially important and, often, the most time-consuming activity. Deriving CERs is a difficult task and the number of valid CERs is significantly fewer than one might expect. While there are many reasons for the lack of valid CERs, the number one reason is the lack of an appropriate database. Figure 3-2: CER Development Process When developing a CER, the analyst must first hypothesize and test logical estimating relationships. For example, does it make sense to expect that costs will increase as aircraft engine thrust requirements increase? Given that it does make sense, the analyst will need to refine that hypothesis to determine whether the relationship is linear or curvilinear. After developing a hypothetical relationship, the analyst needs to organize the database to test the proposed relationship(s). Sometimes, when assembling a database, the analyst discovers that the raw data are at least partially in the wrong format for analytical purposes, or that the data displays irregularities and inconsistencies. Adjustments to the raw data, therefore, almost always need to be made to ensure a reasonably consistent and comparable database. It is important to note that no degree of sophistication in the use of advanced mathematical statistics can compensate for a seriously deficient database. Since the data problem is fundamental, typically a considerable amount of time is devoted to collecting data, adjusting that data to help ensure consistency and comparability, and providing for proper storage of the information so that it can be rapidly retrieved when needed. More effort is typically devoted to assembling a quality database than to any other step in the process. Chapter 2, Data Collection and Analysis, provides further information on this topic. Given the appropriate information, however, the analytical task of deriving CER equations is often relatively easy. C. Testing a CER’s Logic Complementing the issues of deriving a good database is the need to first hypothesize, then test, the mathematical form of the CER. Some analysts believe the hypothesis comes first, then the data search to build a good database. Other analysts believe the data search comes first, and given the availability of data, the subsequent determination of a logical relationship or hypothesis occurs. Regardless of the position taken, the analyst must determine and test a proposed logical estimating relationship. The analyst must structure the forecasting model and formulate the hypothesis to be tested. The work may take several forms depending upon forecasting needs. It involves discussions with engineers to identify potential cost driving variables, scrutiny of the technical and cost proposals, and identification of cost relationships. Only with an understanding of estimating requirements can an analyst attempt to hypothesize a forecasting model necessary to develop a CER. CERs do not necessarily need robust statistical testing. Many firms use CERs and validate them by evaluating how well they predicted the final cost of that portion of the project they were designed to estimate. If the CER maintains some reasonable level of consistency, the firm continues to use it. Consequently, statistical measures are not the only way to measure a CERs validity. Regardless of the validation method, application of the technique must adhere to the company's estimating system policies and procedures. Chapters 7 through 10 provide practical guidance on Government review and evaluation criteria. D. The CER Model Once the database is developed and a hypothesis determined, the analyst is ready to mathematically model the CER. While this analysis can take several forms, both linear and curvilinear, the chapter will initially consider one simple model -- the Least Squares Best Fit (LSBF) model. A number of statistical packages are available that generate the LSBF equation parameters. Most statistical software programs use the linear regression analysis process. However, the chapter will first review manual development of the LSBF equation and the regression analysis process. II. Curve Fitting There are two standard methods of curve fitting. One method has the analyst plot the data and fit a smooth curve that appears to best-fit the relationship in the data. This is known as the graphical method. Although in many cases the "curve" will be a straight line, the vocabulary of cost estimating and math identifies this technique as curve fitting. The other method uses formulas to mathematically develop a line of "best-fit." This mathematical approach is termed the LSBF method and provides the foundation for simple linear regression. Any of the mathematical analysis techniques described in this section of the handbook will work with the simplest CER (regression model) to estimate a straight line. The mathematical equation for a straight line is expressed as: Y = A + B(X). The elements of this equation are discussed in the next section. Although few relationships in cost estimating follow a pure linear relationship, the linear model is sufficiently accurate in many cases over a specified range of the data. A. Graphical Method To apply the graphical method, the data must first be plotted on graph paper. No attempt should be made to make the smooth curve actually pass through the data points that have been plotted. Instead, the curve should pass between the data points leaving approximately an equal number on either side of the line. For linear data, a clear ruler or other straightedge may be used to fit the curve. The objective is to "best-fit" the curve to the data points plotted; that is, each data point plotted is equally important and the curve you fit must consider each and every data point. Although considered a rather outdated technique today, plotting the data is still generally a good idea. Spreadsheets with integrated graphical capabilities make this task rather routine. By plotting the data, we get a picture of the relationship and can easily focus on those points that may require further investigation. Before developing a forecasting rule or mathematical equation, the analyst is advised in every case to plot the data and note any points that may require further investigation. B. LSBF Method The purpose of the LSBF analysis is to improve our ability to predict the next "real world" occurrence of our dependent variable. The LSBF technique is also the root of regression analysis, which may be defined as the mathematical nature of the association between two variables. This association is determined in the form of a mathematical equation. Such an equation provides the ability to predict one variable on the basis of the knowledge of the other variable. The variable whose value is to be predicted is called the dependent variable. The variable for which knowledge is available or can be obtained is called the independent variable. In other words, the value of the dependent variable depends on the value of the independent variable(s). The relationship between variables may be linear or curvilinear. A linear relationship means that the functional relationship can be described graphically (on an ordinal X-Y coordinate system) by a straight line and mathematically by the common form: Y = A + B(X), where: Y = represents the calculated value of Y (the dependent variable) X = the independent variable B = the slope of the line (the change in Y divided by the change in X), and A = the point at which the line intersects the vertical axis (Yaxis). The bi-variate regression equation (the linear relationship of two variables) consists of two distinctive parts, the functional part and the random part. The equation for a bi-variate regression population is: Y = A + B(X) + E. The portion of the equation given by "A + B(X)" is the functional part (a straight line), and E (the error term) is the random part. A and B are parameters of the population that exactly describe the intercept and slope of the relationship. The term "E" represents the random part of the equation. The random part of the equation is always present because of the errors of assigning value, measurement, and observation. These types of errors always exist because of human limitations, and the limitations associated with real world events. Since it is practically impossible to capture data for an entire population, we normally work with a representative sample from that population. We denote that we are working with a sample by adjusting our equation to the form: Y = a + b(X) + e. Again, the term "a + b(X)" represents the functional part of the equation and "e" represents the random part. The estimate of the true population parameters "A" and "B" are represented in the sample equation by "a" and "b", respectively. In this sense then, "a" and "b" are statistics. That is, they are estimates of population parameters. As statistics, they are subject to sampling errors. Consequently, a good random sampling plan is important. The LSBF method specifies the one line that best fits the data set. The method does this by minimizing the sum of the squared deviations of the observed values of Y and calculated values of Y. The observed value represents the value that is actually recorded in the database, the calculated value of Y, identified as Yc, is that value the equation predicts given the same value of X. For example, suppose we estimated engineering hours based on the number of drawings using the following linear equation: EngrHours = 467 + 3.65 (NumEngrDrawings). In this case "EngrHours" is the dependent, or Y-variable, and "NumEngrDrawings" is the independent or X-variable. Suppose the company’s database contained 525 hours for a program containing 15 engineering drawings. The 525 hours represents the observed value for Y when X is equal to 15. The equation however would have predicted 521.75 hours (Yc = 467 + 3.65(x) = 467+3.65(15) = 521.75). The "521.75" is the calculated value of Y, or Yc. The difference between the observed and calculated value represents the error ("e") in the equation (model). The LSBF technique analyzes each (X,Y) pair in the database, refining the parameters for the slope and intercept terms, until it finds the one equation for the line that minimizes the sum of the squared error terms. To illustrate this process, assume the measurement of the error term for four points: (Y1 - YC1,), (Y2 - YC2), (Y3 - YC3), (Y4 - YC4). The line that best fits the data, as shown in Figure 3-3, is the line that minimizes the following summation: ; where "i" is simply a counting scheme to denote that the technique minimizes the squared distance for all elements in the data set. The data set starts with observation number 1 and ends with the last one. In this case, the last observation is number 4. Figure 3-3: LSBF Graphical Estimation To calculate the LSBF line for a database of n-number of observations, the analyst needs to find the "a + bX" which minimizes: . Fortunately, the use of calculus has determined that the LSBF "a + bX" parameters minimizes the squared error term when: (1) Y = AN + B (2) XY = A X and X+B X2 Equations (1) and (2) are called the normal equations of the LSBF line. References contained in any comprehensive statistical textbook will illustrate that these two equations do meet the requirements of the ordinary LSBF regression. These properties are: The technique considers all points. The sum of the squared deviations between the line and observed points is the minimum value possible, that is, (Y - Yc)2 = E2 = a minimum. Similarities between these two properties and the arithmetic mean should also be observed. The arithmetic mean is the sum of the values of the independent variable divided by the number of observations or X/n = and the sum of the "Ys" divided by the number of observations or Y/n = . It follows that the point falls on the LSBF line. To calculate the "a" and "b" for the LSBF line, we need a spreadsheet format, as shown in Figure 3-4. Computation Element X Y X*Y X2 Y2 X1 Y1 X1 * Y1 X12 Y12 X2 Y2 X2 * Y2 X22 Y22 X3 Y3 X3 * Y3 X32 Y32 - - - - - - - - - - X2 Y2 Sum of the Column ( ) Figure 3-4: Generic LSBF Analysis Table To illustrate the calculations for a line of best fit, suppose we collected data and assembled it in the LSBF Analysis Table format, as shown in Figure 3-5. Computation Element Sum of the Column ( ) X Y XY X2 Y2 4 10 40 16 100 11 24 264 121 576 3 8 24 9 64 9 12 108 81 144 7 9 63 49 81 2 3 6 4 9 36 66 505 280 974 Figure 3-5: LSBF Analysis Example From the "normal equations" for the LSBF technique, we can derive equations to calculate "a" and "b" directly. The equations for "b" and "a" are given by: (3) , and (4) . (Recall that once we know the slope "b," we can solve the general equation Y = a + b (X) for "a" because we know that the point must lie on the line and therefore, directly solve the equation. Solving first for "b," we use the data from Figure 3-5 and substitute the values into equation (3). Recall that and, , where n = the number of observations. Notice that the last row contains the sum of values ( ) for the element in bold in the top row of the figure. Solving for "b", therefore, yields: Solving for "a" yields: Therefore, the LSBF equation for the line is calculated value of Y). (i.e., the C. Limitations, Errors and Caveats of LSBF Techniques When working with the LSBF technique, there are a number of limitations, errors and caveats to note. The following are some of the more obvious ones. 1) Assumptions of the LSBF Model With the LSBF method, there are a number of critical assumptions for the theory to work precisely. If any of the assumptions are not valid, then theoretically the technique is flawed. Many applied mathematicians, however, consider the assumptions more as guidelines on when the technique will work the best. If any assumption is violated, then the next question is how significant is the violation. If the violation is relatively minor, or the data almost nearly complies, then the technique is generally satisfactory for estimating. The size of the error term and other statistical measures should provide sufficient indication of the validity of the technique, even when the data do not completely adhere to the assumptions identified below: The values of the dependent variable are distributed by a normal distribution function around the regression line. The mean value of each distribution lies on the regression line. The variance of each array of the independent variable is constant. The error term in any observation is independent of the error term in all other observations. When this assumption is violated, data is said to be autocorrelated. This assumption requires the error term to be a truly random variable. There are no errors in the values of the independent variables. The regression model specifies that the independent variable be a fixed number, and not a random variable. All causation in the model is one way. The causation must go from the independent variable to the dependent variable. Causation, though neither statistical nor a mathematical requirement, is a highly desirable attribute when using the regression model for forecasting. Causation, of course, is what cost analysts are expected to determine when they hypothesize the mathematical logic of a CER equation. 2) Extrapolation Beyond The Range of The Observed Data A LSBF equation is theoretically valid only over the same range of data from which the sample was initially taken. In forecasting outside this range, the shape of the curve is less certain and there is more estimating risk involved. Less credence is given to forecasts made with data falling outside the range of the original data. However, this does not mean that extrapolation beyond the relevant range is always invalid. It may well be that forecasting beyond the relevant range is the only suitable alternative available. The analyst must keep in mind that extrapolation assigns values using a relationship that has been measured for circumstances that may differ from those used in the forecast. It is the analyst’s job to make this determination, in coordination with the technical and programmatic personnel from both the company and the Government. 3) Cause And Effect Regression and correlation analysis can in no way determine cause and effect. It is up to the analyst to do a logic check, determine an appropriate hypothesis, and analyze the database so that an assessment can be made regarding cause and effect. For example, assume a high degree of correlation between the number of public telephones in a city and city liquor sales. Clearly, there is no cause and effect involved here. A variable with a more logical nexus, such as population, is a more causal independent variable that drives both the number of public telephones and liquor sales. Analysts must ensure that they have chosen approximately related data sets and that real cause and effect is at work in their CERs. 4) Using Past Trends To Estimate Future Trends It is very important to know that conditions change. If the underlying population is no longer relevant due to changes in technology, for example, then the LSBF equation may not be the best forecasting tool to use. When using a CER, the analyst needs to ensure the factors in the forecast still apply to the original historical LSBF equation. D. Multiple Regression In simple regression analysis, a single independent variable (X) is used to estimate the dependent variable (Y), and the relationship is assumed to be linear (a straight line). This is the most common form of regression analysis used in CER development. However, there are more complex versions of the regression equation that can be used that consider the effects of more than one independent variable. Multiple regression analysis, for example, assumes that the change in Y can be better explained by using more than one independent variable. For example, automobile gasoline consumption may be largely explained by the number of miles driven. However, we may postulate a better explanation if we also considered factors such as the weight of the automobile. In this case, the value of Y would be explained by two independent variables. Yc = a + b1X1 + b2X2 where: Yc = a = X1 = b1 = X2 b2 = = the calculated or estimated value for the dependent variable the Y intercept, the value of Y when all Xvariables = 0 the first independent (explanatory) variable the slope of the line related to the change in X1: the value by which Yc changes when X1 changes by one the second independent variable the slope of the line related to the change in X2: the value by which Yc changes when X2 changes by one Finding the right combinations of explanatory variables is no easy task. Relying on the general process flow in Figure 3-2, however, helps immeasurably. Postulating the theory of which variables most significantly and independently contribute towards explaining cost behavior is the first step. Many applied statisticians then use a technique called step-wise regression to focus on the most important cost driving variables. Step-wise regression is the process of "introducing the X variables one at a time (stepwise forward regression) or by including all the possible X variables in one multiple regression and rejecting them one at a time (stepwise backward regression). The decision to add or drop a variable is usually made on the basis of the contribution of that variable to the ESS [error sum of squares], as judged by the F-test." 1 Stepwise regression allows the analyst to add variables, or remove them, in search of the best model to predict cost. Stepwise regression, however, requires the analyst to carefully understand the variables they are introducing to the model, to hypothesize the effect the variables should have on the model, and to monitor for the effects of multicollinearity. Multicollinearity occurs when two or more presumed independent variables exhibit a high degree of correlation with each other. In short, the explanatory variables are not making independent contributions toward explaining the variance in the dependent variable. The mathematics of regression analysis cannot separate or distinguish between the contributions each variable is making. This prevents the analyst from determining which variable is stronger or whether the sign on the parameter for that variable is correct. The analyst must rely on the postulated theory and pair-wise correlation to help solve this dilemma. Symptoms of multicollinearity include a high explanatory power of the model, accompanied by insignificant or illogical (incorrect sign) coefficient estimates. With multicollinearity, the math may still produce a valid point estimate for Yc. The analyst, therefore, may still be able to predict with the model. They must, however, use the entire equation, and can only project point estimates. Multicollinearity does not allow the analyst to trust the value or the sign of individual parameter coefficients. More detail on multiple regression and stepwise regression is beyond the scope of this Handbook. Please refer to Appendix E for references to web sites and other resources that further address this topic. E. Curvilinear Regression In some cases, the relationship between the independent variable may not be linear. Instead, a graph of the relationship on ordinary graph paper would depict a curve. For example, improvement curve analysis uses a special form of curvilinear regression. Except for the brief review of cost improvement curve analysis that follows, curvilinear regression is beyond the scope of this Handbook. As stated above, please refer to Appendix E for sources of additional information. F. The Cost Improvement Curve Analysis The cost improvement curve form of analysis is a model frequently used in cost estimating and analysis. Many of the commercial cost estimating models base their methods on some form of the basic cost improvement curve. The basic form of the "learning curve" equation is Y = a(X b). Through a logarithmic transformation of the data and the equation, the model appears intrinsically linear: Ln(Y) = Ln(a) + b Ln(X). For both forms of the equation the following conventions apply: Y = Cost of Unit #X (or average for X units) a = Cost of first unit b = Learning curve coefficient Note that the equation Ln(Y) = Ln(a) + b Ln(X) is of precisely the same form as the linear equation Y = a + b(X). This means that the equation Ln(Y) = Ln(a) + b Ln(X) can be graphed as a straight line, and all the regression formulae apply to this equation just as they do to the equation Y = a + b(X). In order to derive a cost improvement curve from cost data (units or lots) the regression equations need to be used, whether the calculations are performed manually or by using a computer based statistical package. In this sense, the cost improvement curve equation is a special case of the LSBF technique. In cost improvement curve methodologies the cost is assumed to decrease by a fixed proportion each time quantity doubles. This constant improvement percentage is called the learning curve "slope" (i.e., 90%). The slope is related to the learning curve coefficient (b) through the equation: . In applying the equation, the analyst must use the decimal form of the slope percentage (i.e., 0.90). The divisor is the Ln(2) because the theory is based on a constant percentage reduction in cost each time the repetitions double. Any good statistical package can perform all the calculations to derive the "a" and "b" terms of the equation. A quality package will let you customize your outputs, and calculate many statistics, including: frequency distributions, percentiles, t-tests, variance tests, Pearson correlation and covariance, regression, analysis of variance (ANOVA), factor analysis and more. Graphics and tables such as scattergrams, line charts, pie charts, bar charts, histograms, and percentiles are generally available to the user. Using these simple software tools greatly simplifies the statistical analysis task. III. Testing the Significance of the CER After discussing the LSBF regression technique, the chapter next turns to evaluating the quality of the CER. This answers the questions: How good is a CER equation and how good is the CER likely to be for estimating the cost of specific items or services? What is the confidence level of the estimate (i.e., how likely is the estimated cost to fall within a specified range of cost outcomes)? Many analysts rely on two primary statistics to make this determination: the coefficient of correlation (R) and the related coefficient of determination (R 2). Both of these measures simply indicate the degree of relatedness between the variables. Neither measure indicates cause and effect. Cause and effect requires a check of logic and depends on the acumen of the analyst. There are a number of other statistics to evaluate to expand the knowledge and confidence in the regression equation and the assurance of its forecasting capability. Figure 3-6 provides an example of one possible list of items to examine when evaluating the quality of a CER. The matrix categories are listed in order of precedence. The top portion of the matrix focuses on the statistical validation of the CER or model, while the bottom portion of the matrix focuses on the use of the CER or model in predicting future estimates. Figure 3-7 provides definitions of the evaluation elements shown in the below matrix. Figure 3-6: CER Quality Review Matrix One caution is warranted when performing statistical analysis of a relationship. There is no one statistic that disqualifies a CER or model, nor is there any one statistic that "validates" a CER or model. The math modeling effort must be examined from a complete perspective, starting with the data and logic of the relationship. For example, the matrix shown in Figure 3-6 requires an analyst to provide a complete narrative explanation of the quality of the database and the logic of the proposed model. Only after ensuring that the data and the logic of the relationship are solid should the analyst begin evaluating the statistical quality of the model. Statistical examination typically begins with an evaluation of the individual variables in the model. The tstat for each explanatory variable is the most common method to evaluate the variable's significance in the relationship. The next step is to assess the significance of the entire equation. The F-stat is the most common statistic used to assess this quality of the entire equation. Assuming the individual variable(s) and the entire equation have significance, the next step is to judge the size and proportion of the equation’s estimating error. The standard error of the estimate (SEE or SE) and coefficient of variation (CV) provide this insight. Finally, the typical statistical analysis concludes with examining the value of the coefficient of determination (R2), or Adjusted R2 when comparing models with a different number of independent variables for each model. The coefficient of determination measures the percentage of the variation in the dependent variable explained by the independent variable(s). The elements in the matrix below the double line focus on the geography of the data on which the CER or model is built. Ideally, the analyst prefers a strong statistical model with a large number of observations, using the fewest number of variables to formulate the equation. In addition, the analyst would like to witness a small number of actual data points that the model poorly predicts. Finally, a critical piece of any evaluation is to identify the range of the independent values on which the model was built. Theoretically, the model is only valid over this relevant range of the independent value data. In practice, use of the model is permissible outside of this range so long as the hypothesized mathematical relationship remains valid. This is likely to be only a small limit beyond the actual values of the data. The range of validity is a judgement call and should rely on those knowledgeable in the element being estimated to help establish over what range the CER will provide reasonable predictions. Because the LSBF model relies so heavily on the mean values for the dependent variable, the matrix provides for recording the mean value of the dependent variable and its associated statistics. The matrix allows the analyst to compare the statistics reported for the CER or model, with the statistics of the mean of the dependent variable as a benchmark. Figure 3-7, on the following page, provides a non-statistical interpretation of some of the statistics referred to in the matrix. Appendix E provides several resources readers can use to obtain additional information. There are no defined standards related to acceptable criteria for the various statistics. The determination of acceptable criteria for a valid CER is based on discussions between the contractor and its customers. There are no absolute thresholds. The analysis matrix; the modeler’s data collection and normalization process; and the associated logic all form the basis for accepting the CER as the basis for estimating. An example provided later in the chapter uses a version of this matrix and analysis process. In order to keep reasonable statistical criteria in the evaluation, the analyst must always ask: "If I reject this CER as the basis for estimating, is the alternative method any better?" -stat: Tests whether the entire equation, as a whole, is valid. t-stat: Tests whether the individual X-variable(s) is/are valid. Standard Error (SE): Average estimating error when using the equation as the estimating rule Coefficient of Variation (CV): SE divided by mean of the Y-data, relative measure of estimating error Coefficient of Determination (R2): Percent of the variation in the Y-data explained by the X-data. Adjusted R2: R2 adjusted for the number of X-variables used to explain the variation in the Y-data Degrees of Freedom (d.f.): number of observations (N) less the number of estimated parameters (# of X-variables + 1 for the constant term "a"). Concept of parsimony applies in that a preferred model is one with high statistical significance using the least number of variables. Outliers: Y-observations that the model predicts poorly. This is not always a valid reason to discard the data. P-value: probability level at which the statistical test would fail, suggesting the relationship is not valid. P-values less than 0.10 are generally preferred (i.e., only a 10% chance, or less, that the model is no good). Figure 3-7: Non-statistical Interpretation of Statistical Indicators IV. When to Use a CER When a CER has been built from an assembled database based on a hypothesized logical statistical relationship, and it is within an acceptable evaluation standard, the CER is ready for application. A CER may be used to forecast costs, or it may be used to cross check an estimate developed using another estimating technique. For example, an analyst may have generated an estimate using a grassroots approach (a detailed build up by hours and rates) and then used a CER, as a sanity check to test the reliability of the grassroots approach. Generally, a CER built for a specific forecast may be used with far more confidence than a generic CER. Care must be taken in using a generic CER when the characteristics of the forecast universe are, or are likely to be, different from those reflected in the CER. Qualifying a generic CER may be necessary to ensure that the database and the assumptions made for its development are valid for its application. A need to update the database with data appropriate for forecasting may be necessary. When using a generic CER as a point-of-departure, the analyst may need to enhance or modify the forecast in light of any other available supplementary information. This most likely will involve several iterations before the final forecast is determined. It is important to carefully document the iterations so that an audit trail exists explaining how the generic CER evolved to become the final forecast. In order to apply good judgment in the use of CERs, the analyst needs to be mindful of their strengths and weaknesses. Some of the more common strengths and weaknesses are presented below: A. Strengths 1. CERs can be excellent predictors when implemented correctly, and they can be relied upon to produce quality estimates when used appropriately. 2. Use of valid CERs can reduce proposal preparation, evaluation, negotiation costs, and cycle time, particularly in regard to low-cost items that are time and cost intensive to estimate using other techniques. 3. They are quick and easy to use. Given a CER equation and the required input data, developing an estimate is a quick and easy process. 4. A CER can be used with limited system information. Consequently, CERs are especially useful in the research, development, test and evaluation (RDT&E) phase of a program. B. Weaknesses 1. CERs are sometimes too simplistic to forecast costs. When detailed information is available, the detail may be more reliable for estimates than a CER. 2. Problems with the database may mean that a particular CER should not be used. While the analyst developing a CER should also validate both the CER and the database, it is the responsibility of any user to validate the CER by reviewing the source documentation. The user should read what the CER is supposed to estimate, what data were used to build that CER, how old the data are, and how it was normalized. Never use a CER or cost model without reviewing its source documentation. The next two sections of the chapter focus on the application of the CER technique by providing examples from common CER applications to applications by contractors who participated in the Parametric Estimating Reinvention Laboratory. V. Examples of CERs in Use CERs reflect changes in prices or costs (in constant dollars) as some physical, performance or other cost-driving parameter(s) changes. The same parameter(s) for a new item or service can be input to the CER model and a new price or cost can be estimated. Such relationships may be applied to a wide variety of items and services. A. Construction Many construction contractors use a rule of thumb that relates floor space to building cost. Once a general structural design is determined, the contractor or buyer can use this relationship to estimate total building price or cost, excluding the cost of land. For example, when building a brick two-story house with a basement, a builder may use $60/square foot (or whatever value is currently reasonable for the application) to estimate the price of the house. Assume the plans call for a 2,200 square foot home. The estimated build price, excluding the price of the lot, would be: $132,000 ($60/sq. ft. x 2,200 sq. ft.). B. Electronics Manufacturers of certain electronic items have discovered that the cost of completed items varies directly with the number of total electronic parts in the item. Thus, the sum of the number of integrated circuits in a specific circuit design may serve as an independent variable (cost driver) in a CER to predict the cost of the completed item. Assume a CER analysis indicates that $57.00 is required for set-up, and an additional cost of $1.10 per integrated circuit required. If evaluation of the drawing revealed that an item was designed to contain 30 integrated circuits, substituting the 30 parts into the CER produces the following estimated cost: estimated item cost = $57.00 + $1.10 per integrated circuit * number of integrated circuits = $57.00 + $1.10 (30) = $57.00 + $33.00 = $90.00 C. Weapons Procurement In the purchase of an airplane, CERs are often used to estimate the cost of the various parts of the aircraft. One item may be the price for a wing of a certain type of airplane, such as a supersonic fighter. History may enable the analyst to develop a CER relating wing surface area to cost. The analyst may find that there is an estimated $40,000 of wing cost (for instance nonrecurring engineering) not related to surface area, and another $1,000/square foot that is related to surface area to build one wing. For a wing with 200 square feet of surface area, we could estimate a price as: estimated price = $40,000 + 200 sq ft x $1,000 per sq. ft. = $40,000 + 200,000 = $240,000 VI. Examples of CER Development at Parametric Estimating Reinvention Laboratory Sites Throughout this section, the description of the process used in CER development, data requirements, validation, and documentation of simple CERs will rely on the experiences of three Reinvention Laboratory sites. These sites included Boeing Aircraft & Missiles Systems (St. Louis, MO.), Boeing Aircraft & Missiles Systems (Mesa, AZ.), and Lockheed Martin Tactical Aircraft Engines (Ft. Worth, TX). Figure 3-8 provides examples of simple CERs implemented by these lab sites CER Title Panstock Material Pool Description Allocated panstock dollars charged. Base Description Manufacturing Assembly "touch" direct labor hours charged. Application Panstock is piece-part materials consumed in the manufacturing assembly organization. The panstock CER is applied to 100% of estimated direct labor hours for manufacturing assembly effort. F/A-18 Software Design Support Allocated effort required to perform software tool development and support for computer & software engineering. Computer and software engineering direct labor hours charged. F/A-18 computer and software engineering support direct labor hours estimated for tool development. Design Hours Design engineering including analysis and drafting direct labor hours charged. Number of design drawings associated with the pool direct labor hours. The design hours per drawing CER is applied to the engineering tree (an estimate of the drawings required for the proposed work). Systems Engineering Systems engineering (including requirements analysis and specification development), direct labor hours charged. Design engineering direct labor hours charged. The system engineering CER is applied to the estimated design engineering direct labor hours. Tooling Material Nonrecurring, inhouse, tooling raw material dollar costs charged. Tooling nonrecurring direct labor hours charged. The tooling material CER is applied to the estimated nonrecurring tooling direct labor hours. Test/Equipment Material (dollars for avionics) Material dollars (<$10k). Total avionics engineering procurement support group direct labor hours charged. The test/equipment material dollars CER is applied to the estimated avionics engineering procurement support group direct labor hours Figure 3-8: Examples of Simple CERs A. Developing Simple CERs For CERs to be valid, they must be developed and tested using the principles previously discussed. Analysts rely on many forms of CERs in developing estimates, and employ the use of CERs throughout the various phases of the acquisition cycle. The value of a CER depends on the soundness of the database from which it was developed, and the appropriateness of its application in the next estimating task. Determination of the "goodness" of a CER, and its applicability to the system being estimated, requires a thorough understanding of the CER and the product being estimated by the cost analyst. As a tool, CERs are analytical equations, which relate various cost categories (either in dollars or physical units) to cost drivers. In mathematical terms, the cost drivers act as an equation’s explanatory variables. CERs can take numerous forms ranging from informal "Rule-of-Thumb" or simple analogies to formal mathematical functions derived from statistical analysis of empirical data. When developing a CER, the analyst should focus on assembling and refining the data that constitute the empirical basis for the CER. 1) Data Collection/Analysis Sometimes, when assembling a database, the analyst discovers the raw data are in the wrong format for analytical purposes, or the data display irregularities and inconsistencies. Therefore, adjustments to the raw data usually need to be made to ensure a reasonably consistent and comparable database. Not even the use of advanced mathematical modeling techniques can overcome or compensate for a seriously deficient database. Typically, a considerable amount of time is devoted to collecting data, normalizing (adjusting) the data to ensure consistency and comparability, and providing proper information storage so it can be rapidly retrieved. Indeed, more effort is typically devoted to assembling a quality database than to any other task in the process. When enough relevant data has been collected, the analytical task of deriving CER equations is often relatively easy. Data normalization is essential for ensuring consistency and comparability. Chapter 2 discusses data collection and analysis activities in further detail. As a general rule, normalizing data typically addresses the following issues: Type of effort – such as non-recurring versus recurring, development versus change proposals, and weapon systems versus ground support equipment. Time frame – such as number of months/year to cover the period of performance, and total cumulative data from inception to completion. Measurable milestones to collect data – such as first flight, drawing release, program completion, and system compliance test completion. 2. Validation Requirements CERs, like any other parametric estimating methodology, are of value only if they can demonstrate, with some level of confidence, that they produce results within an acceptable range of accuracy. The CERs must also demonstrate reliability for an acceptable number of trials, and they should be representative of the database domain for which they are applied. A process that adequately assures that the CERs and estimating methodology meet these requirements is called validation. Since both the developer and the customer must, at some point, agree on the validation criteria, the Reinvention Laboratory demonstrated that the use of Integrated Product Teams (IPT) is a best practice for implementing CERs. Preferably, IPTs should consist of members from the contractor, buying activity, Defense Contract Management Command (DCMC), and Defense Contract Audit Agency (DCAA). Chapter 8 provides guidance relative to establishing an implementation team. One of the Parametric Estimating Reinvention Laboratory teams developed a validation process flow, illustrated in Figure 3-9. This process is an adaptation of the testing methodology described earlier in this chapter. The process depicted in Figure 3-9 and described in Figure 3-10, is a formal company procedure to develop and implement CERs. The company developed this methodology with its customer, local DCMC, and local DCAA. This procedure describes the activities and criteria for validating Simple CERs, Complex CERs, and models. Figure 3-11, provides the team’s guidelines for statistical validation criteria and is an adaptation of the CER analysis matrix discussed earlier in this chapter. In this example, an IPT was formed and officially designated the "Joint Estimating Relationship Oversight Panel" (JEROP). Figure 312 provides a description of the JEROP membership. The JEROP manages the processes associated with implementing, maintaining, and documenting CERs Figure 3-9: ER Validation Process Figure 3-10: Discussion of Activities Figure 3-11: Summary of ER Report Card Criteria JEROP Membership Developer (Company Personnel) Group Manager-Estimating Principal Specialist-Estimating Manager-Contracts & Pricing-Spares Sr. Specialist-Accounting DCAA Supervisory Auditor DCMC Industrial Engineer Contract Price Analysts In this case, the customer was not a full-time member of the IPT. However, the customer provided feedback to the IPT on a routine basis. figure 3-12: JEROP Membership It is important to note that in establishing this process, the IPT uses the report card criteria as a starting point to evaluate the strength of the CERs. The IPT does not use the statistical tests as its sole criteria for accepting or rejecting the CERs. An equally important factor to their assessment of the quality of the CER is the non-quantitative information such as the materiality of the effort and the quality of possible alternative estimating methods. Their experience demonstrated that while statistical analysis is a useful tool, it should not be the sole criteria for accepting or rejecting CERs. The highest priority is determining that the data relationships are logical; the data used are credible; and adequate policies and procedures have been established to ensure CERs are implemented, used, and maintained appropriately. 3) Documentation When implementing CERs, a company should develop standard formats for documentation. Consistency in documentation provides a clear understanding of how to apply and maintain the CER. The documentation should evolve during the development process. During each stage of development, the team should maintain documentation on a variety of items. This should include at minimum, all necessary information for a third party to recreate or validate the CER. This documentation should include: An adequate explanation of the effort to be estimated by the CER. Identification and explanation of the base. Include rationale for the base chosen when appropriate. Calculation and description of effort (hours, dollars, etc.) in the pool and base. Application information. Complete actual cost information for all accounting data used. This provides an audit trail that is necessary to adequately identify the data used. Noncost information (technical data) should also be included. B. Lessons Learned from CER Implementation Simple CERs are, by their very nature, straightforward in their logic and application. Lessons learned from the accomplishment of the Parametric Estimating Reinvention Laboratory demonstrated that IPTs are a best practice for implementing broad use of CER-based parametric estimating techniques. Perhaps one of the most valuable accomplishments of the Reinvention Laboratory teams was the successful partnership established between the contractor, customer, DCMC, and DCAA at each of the lab sites. Figure 3-13 summarizes the lessons learned from the IPTs that implemented CERs. Cultural Change – It takes time and effort to work together openly in an IPT environment. It may take a while to build trust if the existing climate does not encourage a collaborative environment with common goals. Empowering the IPTs – Team members should be empowered to make decisions. Therefore, the teams should include people with decision-making authority. Joint Training – All team members should participate in training sessions together. Joint IPT training provides a common understanding of terminology and techniques, and it facilitates team-building. Strong Moderating – Teams should meet regularly and focus on the most significant issues. This may require using a facilitator with strong moderating skills. Management Support – Without total commitment from management, IPTs may question the value of their efforts. Management should provide support in terms of resources, consultation, interest in the progress, resolution of stalemates, and feedback through communication Figure 3-13: PCEI Lessons Learned VII. Evaluating CERs A. Government Evaluation Criteria Contractors should implement the use of CERs as part of their estimating system. Chapter 7, Regulatory Compliance, discusses estimating system requirements in detail. In general, Government evaluators will focus on evaluating and monitoring CERs to ensure they are reliable and credible cost predictors. Specific Government evaluation criteria is discussed in Chapter 9, Auditing Parametrics, and Chapter 10, Technical Evaluations of Parametrics. This section provides a general overview of CER evaluation procedures that can be used by anyone. Such evaluations generally include: Determining if the data relationships are logical; Verifying that the data used are adequate; Performing analytical tests to determine if strong data relationships exist; and Ensuring CERs are used consistently with established policies and procedures, and that they comply with all Government procurement regulations. B. Logical Data Relationships CER development and implementation requires the use of analytical techniques. When analyzing CERs, evaluators should be concerned with ensuring that the data relationships used are logical. Potential cost drivers can be identified through a number of sources, such as personal experience, experience of others, or published sources of information. As an example, during the Parametric Estimating Reinvention Laboratory, one of the IPTs developed a process for identifying possible cost drivers. Using brainstorming techniques, the IPT identified several alternatives for potential cost drivers. The team then surveyed several experts to obtain their feedback on the merits of each potential cost driver. Figure 3-14 contains an example of their survey mechanism. Figure 3-14: Sample Survey Using this survey process, the IPT was able to identify the best cost driver candidates for further analysis. Key questions the IPT considered in making its determination, which should be important to evaluators, are: Does the CER appear logical (e.g., will the cost driver have a significant impact on the cost of the item being estimated)? Does it appear the cost driver(s) will be a good predictor of cost? How accessible are the data (both cost and noncost data)? How much will it cost to obtain the necessary data (if not currently available)? How much will it cost to obtain the data in the future? Will there be a sufficient number of data points to implement and test the CER(s)? Have all potential cost drivers been considered? Were any outliers excluded, and if so, what was the rationale? C. Credible Data Contractors should use historical data whenever appropriate. As described in Chapter 2, Data Collection and Analysis, parametric techniques generally require the use of cost data, technical data, and programmatic data. Once collected, a contractor will normalize the data so it is consistent. Through normalization, data are adjusted to account for effects such as inflation, scope of work, and anomalies. All data, including any adjustments made, should be thoroughly documented by a contractor so a complete trail is established for verification purposes. All data used to support parametric estimates should be accurate and traceable back to the source documentation. Evaluators should verify the integrity of the data collected. This means an evaluator may want to verify selected data back to the originating source. The evaluator may also want to evaluate adjustments made as a result of data normalization to ensure all assumptions made by the contractor are logical and reasonable. Some key questions an evaluator may ask during the review of data collection and normalization processes are: Are sufficient data available to adequately develop parametric techniques? Has the contractor established a methodology to obtain, on a routine basis, relevant data on completed projects? Are cost, technical, and program data collected in a consistent format? Are procedures established to identify and examine any data anomalies? D. Strength of Data Relationships After determining data relationships are logical and the data used are credible, the evaluation should next assess the strength of the relationships between the cost being estimated and the independent cost driver(s). These relationships can be tested through a number of quantitative techniques, such as simple ratio analysis, ANOVA, and statistical analysis. The evaluation should consider the associated risk of the cost and the number of data points available for testing data relationships. Often, there are not a lot of current data available and statistical techniques may not be the best quantitative method to use. This would be the case when a company, out of convenience, establishes simple factors, or ratios, based on prior program experience to estimate items of an insignificant amount. Such factors would not lend themselves to regression techniques, but could be verified using other analytical procedures, such as comparisons to prior estimates. However, when there are sufficient data available, and when the cost to be estimated is significant, statistical analysis is a useful tool in evaluating the strength of CERs. When statistical analysis is performed, refer to the matrix provided in Figure 3-6 as a method for evaluation. E. CER Validation CER validation is the process, or act, of demonstrating the technique’s ability to function as a credible estimating tool. Validation includes ensuring contractors have effective policies and procedures; data used are credible; CERs are logical; and CER relationships are strong. Evaluators should test CERs to determine if they can predict costs within a reasonable degree of accuracy. The evaluators must use good judgment when establishing an acceptable range for accuracy. Generally, CERs should estimate costs as accurately as other estimating methods (e.g., bottoms-up estimates). This means when evaluating the accuracy of CERs to predict costs, assessing the accuracy of the prior estimating method is a key activity. CER validation is an on-going process. The evaluation should determine whether contractors using CERs on a routine basis have a proper monitoring process established to ensure CERs remain reliable. A best practice is to establish ranges of acceptability, or bands, to monitor the CERs. If problems are identified during monitoring, contractors should have procedures in place to perform further analysis activities. In addition, when a contractor expects to use CERs repeatedly, the use of Forward Pricing Rate Agreements (FPRAs) should be considered. FPRAs are discussed in Chapter 7, Regulatory Compliance. F. Summary of CER Evaluation CER analysis also requires addressing the questions: What is the proportion of the estimate directly affected by CERs? How much precision is appropriate to the estimate in total and to the part affected by the CERs? Is there a rational relationship between the individual CER affected variables and the underlying variables? Is the pattern of relationship functional or purely statistical? If functional, what is the functional relationship? And why? If statistical, is the history of the relationship extensive enough to provide the needed confidence that it operates in the given case? Is the pattern of relationship statistically significant? And at what level of confidence? What is the impact on the estimate of using reasonable variations of the CERs? VIII. Conclusion This chapter has presented the concept of CERs and the statistical underpinnings of CER development and application. Basic mathematical relationships were described and examples showing the use of CERs were also presented. The next chapter builds on this knowledge by discussing the development and application of company-developed (proprietary) models. Typically these models organize and relate organization specific CERs into an estimating model. 1 Gujarati, Domaodar. Basic Econometrics. McGraw-Hill Book Company, New York, 1978. p. 191.