North Pacific Research Board: Final Report I. Title: Establishing a statewide data warehouse of salmon size, age and growth records Principal Investigator(s) and Recipient Organization(s): Beverly Agler, Alaska Department of Fish and Game, Mark, Tag, and Age Lab, 10107 Bentwood Place, Juneau, AK 99801 bev_agler@fishgame.state.ak.us Project #: R0311 Contract Period and Amount of Funding: July 1, 2003-January 31, 2005 II. Abstract: Collecting biological data is fundamental to managing and monitoring Alaska’s salmon resources. Throughout Alaska, hundreds of thousands of salmon are examined annually for sex and size information, and scales are collected for later extraction of age data. This represents an enormous sampling effort that over 40-plus years has resulted in collections of data records and scale samples that number in the millions. This information has been used for local management needs, which usually involves compiling and tracking data in summary form. As a result, there has been no common process or protocol for managing and preserving the historical data and scale samples. The intention of this project was to initiate a process for establishing an electronic data warehouse environment through which historical salmon sampling and scale pattern data 1 can be maintained and updated annually from collections throughout the state. This project was designed to inventory all collections and to establish a steering committee composed of state, federal and research interests to develop common protocols and plan a strategic approach. The long-term product will be a standardized web-accessible database where the sampling information will be available to facilitate future research. III. Purpose: a. Detailed description of reason for funding and/or problem that was addressed. The intention of this project was to inventory and to determine the best methodology to provide access to a valuable and unique collection of salmon specimen data collected in Alaska by the Alaska Department of Fish and Game (ADFG). Specifically, we requested funding to inventory all ADFG data collections and to determine the process, protocols, and funding needed to establish a data repository of salmon size, sex, and age data for the State of Alaska. The data repository would ultimately incorporate scale images and the associated growth data as well as make the information available to the public and researchers via a web-based interface. The repository would maintain copies of the data that are collected by the ADFG Regional offices and provide secondary filtering or processing as needed. While we have referred to this end product as a data warehouse, in the lexicon of data management it would more accurately be described as a repository or a data mart. Data warehouses typically contain diverse data types which may cover, under one 2 umbrella, the wide ranging information and data needs of a large institution. They are complex and difficult entities to build and have not always been successfully deployed. In contrast, a data mart provides views and reports of data that have common attributes, such as the case with this sampling data. They require fewer resources for maintaining the data in a repository and have a less complex data structure and thus a greater likelihood for successful deployment. Data accessed through a data mart can still be matched and merged with other databases. The first challenge, however, was to establish a repository of the data. These samples were collected throughout the state over the past 45 years and the collections were dispersed throughout the state. We also tried to accommodate changes in collecting protocols that may occur in the future. The protocols for collecting data from salmon in Alaska were largely in place before statehood when salmon biologists (primarily with the U.S. Bureau Commercial Fisheries and the University of Washington) used sampling techniques that were commonly used along the Pacific Coast. After statehood, many of these practices, such as storing scales on gum cards in the field, and making impressions of these scales on acetate cards for examination of age and growth and archiving, were adopted by state biologists and are still in use today. To keep track of the records, an index number was used to associate scale cards with datasheets containing sampling information and the individual size and sex data. After an experienced reader viewed the scale acetate cards with a microfiche projector, the ages were recorded and tabulated to determine age, sex and size composition of the sample. Thus, these data are extensive. Scales from all salmonids species have been collected within Alaska from 1960 to the present. 3 Currently, the number of records exceeds 12 million, and this increase each year. Most major drainages and stocks of salmon in Alaska have been sampled, and in many locations, this sampling has been continuous since statehood. These data are primarily housed in the four regional offices of ADFG, though portions of collections are stored at the field offices. In the late 1970’s, the ADFG Division of Commercial Fisheries established a statewide stock separation program, which developed common protocols for recording the age, sex, and length data in standardized data fields. These protocols were deployed throughout the state and provided a level of consistency. This centralized program, however, was dissolved in the early 1980’s, and project leaders began to modify data collection methods and the codes used to differentiate items such as gear type, location, etc. to suit the requirements of an individual project. The advent of microcomputers enabled the transfer of the raw data from the forms to text files. Some of these files were saved on 5 1/4” floppy disks; whereas, others were transferred to separate R:Base and Access databases on desktop PCs. Data from many systems, prior to the 1980’s, are available only from the original datasheets. In recent years, a number of research projects have utilized these collections by extracting growth data from scales as a means to investigate salmon production in both marine and freshwater environments. Although most of these projects used imageprocessing methods to extract measurements from the scales, the methods were not easily reproducible among studies. The resulting datasets, which were generally kept on 4 spreadsheets were not compatible with other datasets. Recent projects have addressed this issue by preserving scales as high-resolution images and documenting the criteria used to extract measurements. Hagen et al. (2001) extracted growth data and combined these with the original sampling data in relational databases. Ultimately matching and merging these scale data with the salmon sampling data and perhaps other products derived from sampling, such as genetic or otolith stock identification remains a long term task. b. Objectives of the project. There were two major concerns that this project intended to address: loss of historical sampling data and loss of the opportunity to create an integrated structured data approach in which the whole will be greater than the sum of the parts. While there was general recognition that the historical salmon data were valuable, ADFG does not have the resources to treat the records as true archival documents. Often portions of the data have been borrowed, and samples were lost or misplaced in the process. In addition through personnel turnover, there has been a loss of continuity on how and where to preserve the samples after they were collected and used. Failure to combine datasets across the state has also been a concern. Although each region intended to modernize their data collection and management, these efforts have occurred at different schedules and, in the absence of common protocols, they have resulted in datasets that are less compatible. The benefits of combining datasets should 5 be obvious: salmon produced in each region share their time in the ocean with each other as well as with salmon from other countries. By not utilizing all the datasets, we lose an opportunity to see common patterns and trends that otherwise might improve our understanding of salmon production and increase our ability to forecast returns. Although conceived as a multiyear project, this initial year of funding was to evaluate the project, identify problems and solutions, and investigate options for further funding. The bulk of these funds was used to conduct a comprehensive inventory of the state’s salmon sampling data collections. The inventory process forced regional offices to locate and identify available electronic and hard copy data, noted the quality and condition of the information, identify common and conflicting data fields, and assess potential problems in combining the datasets. The inventory as well as the data standards developed by the Data Standards Committee has been instrumental in estimating the development and maintenance costs for a statewide database. The steering committee also identified other funding opportunities to cover infrastructure costs and assisted in creating partnerships between state and federal agencies as well as the research community. Thus, our objectives included: 1. Inventory all scale collections statewide to: a. Determine presence or absence of gum cards, b. Determine presence or absence of acetates, 6 c. Determine presence or absence of associated data, and d. Provide access to the inventory through the Internet. 2. Establish a statewide steering committee composed of state, Federal and nongovernmental researchers to: a. Develop a common vision of a web-accessible central repository of these data, b. Facilitate communication and evaluate policy and priorities for establishment and maintenance of the data warehouse, c. Establish data standards to ensure present and future compatibility, d. Supervise completion of the inventory of sample records, and e. Continue scoping issues applicable to achieving the long range goal for creating a central repository of the sampling data. IV. Approach: a. Detailed description of the work that was performed, to include (if applicable): methods or techniques and materials used, design of the study, sample sizes planned, kind of controls and proposed analysis for the results. First, we established a steering committee of state, Federal and non-governmental researchers, and we held our first meeting in Anchorage on September 29, 2003. Sixteen representatives from the four ADFG regions as well as representatives of other federal and research interests attended the meeting. 7 At this meeting, we discussed the background and rationale for this grant. Pete Hagen, National Marine Fisheries Service (NMFS), Auke Bay Lab, outlined some potential funding sources for creation of a statewide database. Bev Agler, ADFG, Mark, Tag, and Age Lab (MTA), gave a short presentation describing the retrospective studies being conducted by the MTA Lab in conjunction with USGS, NMFS and the North Pacific Marine Research Program. The growth data generated from the digital images of the scales has been analyzed extensively by Greg Ruggerone, NRC, Inc. Ruggerone recently published several papers in which he correlated growth of sockeye salmon with climate change and demonstrated an inversely proportional relationship between Bristol Bay sockeye salmon growth and the abundance of Asian pink populations (Ruggerone and Goetz 2004, Ruggerone and Rogers 2003, Ruggerone et al. 2003, 2005). Phil Mundy, representing the Exxon Valdez Oil Spill Trustee Council and a member of the NPRB board, spoke about the funding priorities of these organizations. James Brady, Ecotrust and the Wild Salmon Center, Portland, Oregon, described the State of the Salmon project, sponsored by the Moore Foundation. They plan to build a database of salmon population data for California, Oregon, Washington, British Columbia, Alaska, Russia, and Japan, and to create a web-based iterative system using a metadata geo-referenced approach to facilitate ease of information exchange. Bill Johnson, Information Technology (IT) staff for ADFG’s MTA Lab, and Steve Gebhart, IT staff for ADFG Headquarters, spoke about management of large 8 databases and how ADFG has handled these statewide, centralized repositories. ADFG is currently creating a division-wide (Commercial Fisheries) metadata system. Representatives from each of the four ADFG regions (see page 17) outlined the present status of their data archives. Each region described their current archiving techniques and data storage. Most data are stored in text files by year and sample site and are only accessible by opening a file for each year. Thus, to examine 40 years of data, a researcher must examine 40 different files. The MTA Lab has created Microsoft Access databases for the few systems they have analyzed for historical scale growth data, but the databases are incomplete. Where no electronic data existed, usually prior to 1985, only data for the scales that were examined were included in these databases. Regions 1 and 3 have been compiling files into larger databases and have added some historical data as well. Regional representatives also discussed present data collection methods and exchanged information on new techniques to standardize this process. ASL Records Region No. (millions) Comments Data Storage 1 – Southeast 3.2 Records increased by 0.5 ORACLE database, million per year includes historical data Represents 35-40 years of Some Access files, mostly data. Bristol Bay scales individual text files Alaska 2 – PWS, Cook Inlet, 4 9 Bristol Bay make up bulk of this at 1-2 million 3 – Yukon ~2-3 Kuskokwim Estimates, based on Data were being loaded summary file data into Access files as it is screened from hard copies and text files 4 – Kodiak 2 Adults, 1985-present, 1985 - present in R:Base. older data not included. Older data in banker Smolts, 1989- present boxes some stored at (0.25 million). Sample seasonal field stations. 50,000-100,000 fish per year. Statewide totals ~11-13 Sample rate ~1-2 million per year As shown above, approximately 11-13 million salmon scales and their associated sampling data have been collected by ADFG since statehood. Efforts to compile these records electronically started independently in the various regions. However, there has been no common process or coordination. A primary outcome of the initial meeting was an agreement to form a Data Standards Committee to develop protocols for field and table definitions and data exchange. 10 The participants of the first meeting agreed upon a common vision of creating a web-accessible central repository of the data. There was also recognition that a large amount of effort is needed to verify and edit the raw data if it is to be retrievable electronically, and this would have to occur in stages. The establishment of data standards will help ensure compatibility during this process. Consequently, we held a second meeting in Anchorage on January 15, 2004. This meeting was attended by the nine members of the four ADFG Regional Offices and the MTA Lab, who were appointed to the Data Standards Committee. Discussions at this meeting focused on two major topics: (1) creation of an inventory of all ADFG age, sex, length data and (2) development of data standards for a statewide database. Each region described the status of their electronic inventories with regards to scale and age, sex, length data. 1. Region 1 (Southeast Alaska) had all data in ORACLE from which they could create an inventory. 2. Region 2 (Central Region, Prince William Sound, Cook Inlet, Bristol Bay) was compiling an inventory for Bristol Bay, but other areas, such as Cook Inlet and Prince William Sound, still needed to be addressed. 3. Region 3 (Arctic, Yukon, Kuskokwim Region) was in the process of updating their inventory. Their inventory was entirely electronic and was used as a template for the statewide inventory. 11 4. Region 4 (Kodiak, Chignik, Alaska Peninsula and Aleutian Island fisheries) had 1985-present data in R:Base, so it was possible to extract an inventory from those data. It was questionable whether they could inventory older data, because many of the older records and scale cards were housed at field offices that were inaccessible at this time of year. Luckily, this grant provided the impetus and funding to assemble these records. They are now all stored in Kodiak. This NPRB grant provided approximately one month of salary for a Fish and Wildlife Technician in each region to complete the inventories. The Data Standards Committee chose to follow the format of data standards adopted by the Pacific Salmon Commission. The committee developed a set of data standards, which can be viewed on the Internet at: http://www.taglab.org/ASL/reports/ASL%20spec.doc. These protocols were circulated among committee members via email and updated at the May and August meetings of the Steering Committee. Once data transfer protocols (Appendix A) were discussed, these data standards were changed to a slightly different format viewable on the Internet at: http://www.taglab.org/ASL/reports/ASL%20specX.doc. The group originally developed a two-table structure for the database, but this was changed to a single flat table to simplify data transfer. The steering committee met again in Anchorage May 18, 2004 to coordinate work on the statewide inventory of age, sex and length data. The committee discussed the steps necessary to develop a statewide ASL database. 12 1. Development of the database (ORACLE). 2. Addition of new data to the database. 3. Recovery of the historic electronic data, generally 1985-present. a. Data would need to be verified and formatted. 4. Recovery and entry of data from non-electronic sources (e.g. data forms). 5. Linking of the ASL database with the digitized scales and their associated growth and age data housed at the MTA Lab. The steering committee also developed a list delineating the steps required to create a statewide database: 1. Development of metadata standards. 2. Development of an ORACLE database with a Microsoft Access user front end to be used to interface and query the database. 3. Development of Web-accessible site. 4. Additional data verification and data retrieval by MTA Lab. 5. Data recovery of current electronic data by regions. 6. Data recovery of historical records by regions, including data entry. “Scrubbing” or verification and editing of the data was discussed in some detail. When the electronic inventory for Region 3 was first added into ORACLE, a number of rows were lost due to inconsistencies in data entry. Consequently, data will have to be edited for errors and inconsistencies to prevent loss of important data. 13 The group also discussed where a statewide ASL database could be housed. Most participants felt that the MTA Lab had the infrastructure, because of the large databases they currently maintain that are associated with the Coded Wire Tag and the Thermal Mark Laboratories. Bill Johnson stated that the MTA Lab would need some additional hardware to house another large database, plus the age, sex, length database would be much larger that the Coded Wire Tag database. Although not specifically part of the grant, these meetings brought together ADFG regional staff that are usually separated by distance and allowed us to discuss topics related to these age, sex, length data. For example, John Wilcock demonstrated some data collection methodologies that he has been developing to use WinCE and PalmOS applications to enter data directly into a database. Electronic data collection would eliminate the need for the computer forms currently in use, which have had problems with inconsistencies if not filled out correctly. This project would ultimately benefit all regions. Feasibility testing for this new data entry system was scheduled for Ketchikan during summer 2004, with major testing of other types of data collection in summer 2005. To coordinate work on a new proposal and the final report for this grant, the steering committee conducted its final meeting in Juneau on 9 August 2004. First, we discussed the status of the regional inventories. All regions had submitted at least partial inventories. Bill Johnson demonstrated the web-accessible inventory (Appendix B) http://www.taglab.org/ASL/reports/default.asp. Reports can be run either to the screen or a comma separated (CSV) file, which can then be opened in a program of the users’ preference, such as in Microsoft Excel. A Microsoft 14 Access version of the ASL inventory is also available by accessing the MTA Lab’s ftp server at ftp://ftp.taglab.org/. Johnson also demonstrated the Test ASL database, located on the Internet at the same URL and listed as ASL Specimen Report. The ASL Specimen Report provides access to a “demonstration only” ORACLE database with three years of age, sex, length data from Region 1. It was developed to demonstrate the potential accomplishments as this project progresses. The third report available at this website is the ASL Availability Report. This report was developed to allow regions and MTA Lab staff to track submissions to the ASL database. This report advises regional personnel on the accessibility of the data. It was suggested that data be transferred to the MTA Lab once a year. The data would then be added to the ORACLE database. When this occurred, it would appear on the ASL Availability form. We finished the meeting by discussing how to refine the overall goals of the project. The committee discussed whether this should be a database where data were entered directly from the field and then used by managers for direct management needs, or whether this should just be a central repository archiving the historical records for the state of Alaska? Most regional participants felt that the database would not be responsive enough for the former alternative. Regions would still be required to coordinate and house smaller regional databases, then data would be added to the statewide database by year and region, starting with the current data and working back through the historical data. Consequently, regions would still be responsible for their own management reports and queries needed during field season. 15 b. Project Management: List individuals and/or organizations actually performing the work and how it was done. 1) Bev Agler, Fishery Biologist III, served as the Principal Investigator. She coordinates the scale pattern research conducted by the ADFG’s MTA Lab and directs the efforts to create a comprehensive inventory of the state’s salmon sampling records. She organized the meetings and drafted the minutes for each meeting among the members of the steering committee. Each region prepared an inventory, which was then submitted to the MTA Lab. Agler supervised completion of the final inventory in Microsoft Access and worked closely with Bill Johnson (see below) to develop the web-accessible version. 2) Bill Johnson, Analyst Programmer IV for ADFG’s MTA Lab, provided technical expertise on administration and construction of a data warehouse. He has over 20 years of experience in IT development and has worked with diverse fisheries datasets. He converted the Microsoft Access inventory to ORACLE and developed the web-accessible interface that allows users to access and search the data through the Internet. 3) A steering committee was established August 2003. It was composed of members representing the regional offices of ADFG. a. Region 1 (Southeast Alaska) i. John Wilcock, Fishery Biologist III, who is currently coordinating a project to integrate ASL data for Region 1, and ii. Scott Johnson, Analyst Programmer IV. 16 b. Region 2 (Central Region- Prince William Sound, Cook Inlet, and Bristol Bay) i. Tim Baker, Research Analyst III, ii. Lowell Fair, Area Research Biologist, and iii. Fred West, Assistant Area Research Biologist. c. Region 3 (Arctic-Yukon-Kuskokwim Region) i. Linda Brannian, Fishery Biologist IV, ii. Seth Darr, Analyst Programmer IV, and iii. Holly Moore, Analyst Programmer III. d. Region 4 (Kodiak, Chignik, Alaska Peninsula and Aleutian Islands) i. Patty Nelson, Regional Finfish Research Supervisor, and ii. Matt Foster, Kodiak Finfish Research Biologist. e. Other members i. Dr. Peter Hagen, NOAA-NMFS, Auke Bay Laboratory, facilitated the creation of the steering committee and incorporated a research partnership perspective, ii. Other personnel attended the meetings depending on location, such as Dion Oxman and Bill Rosky from the MTA Lab when the meeting was held in Juneau iii. Some participants were invited to lend their perspectives to issues such as potential funding and management of large data warehouses. 1. Dr. Phil Mundy, EVOS Trustee Council 17 2. Tom Jarvis, Analyst Programmer, ADFG 3. James Brady, Ecotrust, Wild Salmon Center 4. Steve Gebert, ADFG Headquarters, IT Staff 5. Rob Bochenek, EVOS Trustee Council, GEM Data Systems Manager 6. Ron Josephson, Alaska Mark and Tag Coordinator, MTA Lab V. Findings: a. Actual accomplishments and findings. The Data Standards Committee developed the field and table definitions for the data warehouse and suggested protocols for data exchange. These data standards (Appendix A) can be accessed via the Internet at: http://www.taglab.org/ASL/reports/ASL%20spec.doc. As part of this process, the MTA Lab compiled several Internet-accessible reports to demonstrate how data could be compiled and accessed more easily through a statewide program. Several reports are available by going to http://www.taglab.org/ASL/reports/. The first report is a comprehensive inventory of scale specimens (Appendix J). Selecting “ASL Inventory Report” will take the user to a screen where you can query an ORACLE database containing inventories of most of the scales collected by the four regions. This screen allows the user to select year, species, district, sub-district and location. The underlying database contains ~18,000 rows of data. Once these items have been selected, 18 the report can either be displayed or sent to a file that can be opened with Microsoft Excel. For demonstration purposes only, we developed an “Age, Sex, Length Specimen Report” that shows some possible uses of these data (Appendix K). Currently, the database only contains three years of ASL data (1982, 1992, and 2002) from Southeast Alaska. Again, various criteria are available to allow the user to specify boundaries on the query and thus limit the amount of data returned by the database. The third report (Appendix L), the “ASL Availability Report,” was developed to assist regional personnel to follow the data and determine which files have been submitted to the data warehouse, which files have been validated and which files are now accessible through the Internet database. We also started developing Data Transfer Protocols. These guidelines can be accessed via the Internet at: http://www.taglab.org/ASL/reports/ASL%20specX.doc. While research interests were originally to serve in partnership on the steering committee, it was apparent after the first meeting that much of the initial work required the input by ADFG staff familiar with data. It is anticipated that research perspectives will be brought back in during subsequent phases to help establish priorities in developing reports and delivery products. 19 b. If significant problems developed which resulted in less than satisfactory or negative results, they should be addressed. We developed no significant problems. A couple of small collections were not inventoried within the framework of the grant, but they will be inventoried at the earliest opportunity, and the data will be added to the inventory. c. Describe needs of additional work, if results suggest further study. Given the extensive geographic and temporal range represented by this collection, compiling these data in an accessible format will augment future research and monitoring programs. The size and complexity of this endeavor; however, require that it be implemented in phases. Phase 1, the focus of this report, has been completed. In Phase 2, the proposed work will involve populating this database with all available electronic records, developing reports and procedures for annually updating the database from the various collection sites, and addressing the recovery and data entry of records available only as hard copies on historic data sheets. Effort will also be directed towards planning the final phase of the project, which would be to complete historical data entry of key regional systems and to join this database to derivative products collected from other funding sources, such as the growth histories of salmon extracted from the scales via image analysis, which are now currently housed in separate databases at the MTA Lab. Scale growth data in particular – which in its simplest form may contain up to 120 measurements per specimen – provides an enormously detailed means of tracking 20 changes in marine conditions that support salmon production and has proven useful in formulating predictive models. Capturing and preserving these records in a statewide database would be valuable contribution to long-term monitoring and predictive modeling programs. VI. Evaluation: a. Describe the extent to which the project goals and objectives were attained. This description should address the following: 1. Were the goals and objective obtained? 2. Were modifications made to the goals and objectives? If so, explain. As stated in section IIIb, our major objectives were to inventory all commercial salmonid scale collections statewide to determine presence or absence of scales and the associated data and to provide access to this inventory through the Internet. We also planned to establish a steering committee of state, Federal and non-governmental researchers to develop a common vision of a web-accessible central repository of these data, facilitate communication, and evaluate policy and priorities for establishment and maintenance of the data warehouse. We established data standards to ensure compatibility during this process and in the future, supervised completion of an inventory of sample records, and continued scoping issues applicable to achieving the long range goal of creating a central repository of the sampling data. We were able to achieve these goals and objectives and have developed a proposal to carry this project into the second phase, converting the data into a data warehouse or data mart. 21 b. Dissemination of project results: Explain, in detail, how the project results have been, and will be, disseminated. To disseminate the project results, we developed several web-accessible forms. For example, the data standards (Appendix A-I) developed by the Steering Committee can be accessed via the Internet at: http://www.taglab.org/ASL/reports/ASL%20spec.doc. Several reports (Appendix J-L) are accessible at: http://www.taglab.org/ASL/reports/. The first report is a comprehensive inventory of scale specimens (Appendix J). Selecting “ASL Inventory Report” will take the user to a screen where you can query an ORACLE database containing inventories of most of the scales collected by the four regions. This screen allows the user to select year, species, district, sub-district and location. The underlying database contains ~18,000 rows of data. Once these items have been selected, the report can either be displayed or sent to a file that can be opened with Microsoft Excel. Please see Section Va. “Actual Accomplishments and Findings” for further details for accessing these reports. References Hagen, P.T., D.S. Oxman, and B.A. Agler. 2001. Developing and deploying a high resolution imaging approach for scale analysis. (NPAFC Doc. 567). 11 p. Mark, Tag, and Age Lab, Alaska Department of Fish and Game, Juneau, Alaska. Ruggerone, G. T. and Rogers, D. E. 2003. Multi-year effects on high densities of sockeye salmon spawners on juvenile salmon growth and survival: a case study from the Exxon Valdez oil spill. Fish. Res. 63: 379-392. Ruggerone, G. T., Zimmermann, M., Myers, K. W., Nielsen, J. L., and Rogers, D. E. 2003. Competition between Asian pink salmon (Oncorhynchus gorbuscha) 22 and Alaskan sockeye salmon (O. nerka) in the North Pacific Ocean. Fish. Oceanogr. 12: 209-219. Ruggerone, G. T. and Goetz, F. 2004. Survival of Puget Sound chinook salmon (Oncorhynchus tshawytscha) in response to climate-induced competition with pink salmon (Oncorhynchus gorbuscha). Can. J. Fish. Aquat. Sci. 61: 17561770. Ruggerone, G. T., Farley, E., Nielsen, J., and Hagen, P. 2005. Seasonal marine growth of Bristol Bay sockeye salmon (Oncorhynchus nerka) in relation to competition with Asian pink salmon (O. gorbuscha) and the 1977 ocean regime shift. Fish. Bull. 103:355–370 (2005). 23 Appendices Appendix A-I. Data Transfer Protocols developed in conjunction with the Data Standards Committee to simplify transfer of data from regional data collections to the statewide data warehouse. Appendix J. Example of the ASL Inventory Form available on the internet at: http://www.taglab.org/ASL/reports/inventory.asp. Queries of this database can be limited by choosing years, species, districts, sub-districts, and location or some combination of these items. The second page shows the results of a query that we ran where we selected two years of data (2000 and 2001) for chum and sockeye salmon from District 325 (Nushagak area). Appendix K. Example of the ASL Specimen Report available on the Internet at: http://www.taglab.org/ASL/reports/sample.asp. This is also called the Age Sex Length Sample Form. This demonstration database contains only three years of data from Southeast Alaska. It was developed to demonstrate queries that could be created once the data warehouse was in operation. The second page shows the results of a query that we ran where we selected sockeye salmon from the year 2002 from district 101. Appendix L. Example of the ASL Availability Report available on the Internet at: http://www.taglab.org/ASL/reports/availability.asp. This report allows regional users of the data warehouse to track data that have been submitted to see if data have been received and validated. 24 Appendix A Salmon Age-Sex-Length Data Transfer Protocol Variations of format, coding, and definitions have been used among regions in collecting Alaska’s detailed salmon Age-Sex-Length data. Differences have compounded over time, as staffing and program requirements changed. This protocol is defined to facilitate the transmission of Alaska’s historic and future salmon Age-Sex-Length biological data into a central repository. It provides a clear specification that explains the nature of data to the users, and a well-defined transmission mechanism for populating the repository. A. Data Submission Specification Max Chars refers to the maximum number of characters a field’s value may contain. Reqd indicates whether the field must contain a value (nulls ARE NOT accepted when Reqd=Yes). Data Type references how the column should be defined in a relational database. Validation gives specific rules that the submission must fully meet in order to be accepted into the repository. ASL Repository Data Submission Specification Ref Column Name (Alternate Name) Max Chars 1 Sample_ID 23 2 Region_ID 1 3 Sample_Year 4 4 Management_area 5 Tix_management_area Reqd Data Type Description Validation Character Value generated by the reporting region, which may be used to associate specific records in the original regional data as belonging to a particular sampling event. Uniqueness is desired, but not mandatory. Yes Character Commercial Fisheries region of collection Must be ‘1’, ‘2’, ‘3’ or ‘4’ Yes Character 4-digit year in which sampling event occurred Must be between 1930 and the current calendar year 3 Character Geographic area spanning a number of districts. Typically defined at the region level for their areas of interest. 1 Character Management Area code formally defined by the fish ticket system. If present, must match an existing code in Appendix A 25 ASL Repository Data Submission Specification Ref Column Name (Alternate Name) Max Chars Reqd Data Type Description Validation 6 District 3 Character Three character district of observation If present, must match a current or historic CF fish ticket district 7 Subdistrict 3 Character ID for subset of district sampled, if any If present, must be exactly two digits 8 Stream 50 Character ID for anadromous waters catalog stream sampled R 3 always 0 9 Location 3 Character R 1: port codes R 2: stream location R 3: stream location, some fish tickets R 4: some stream locations 10 Project 2 Character “Fishery Type” Legacy code. Typical values are in Appendix I 11 Sample_Day 2 Character Day this sample was taken, or began to be taken. If present, must be 1 or 2 digits in range 1 through 31 12 Sample_Month 2 Character Month number when this sample was started If present, must be 1 or 2 digits in range 1 through 12 13 Sample_Date 10 Date Single date sample was taken If present, must be a valid date. mm/dd/yyyy 14 Gear 2 Character Type of collection gear. Typical values are in Appendix H. 15 Harvest_Code 2 Character Type of commercial fishery sampled 16 Mesh 5 Character Net mesh size If present, must match a current or historic Harvest Code in Appendix B Converted to inches None for R1 and R4 26 ASL Repository Data Submission Specification Ref Column Name (Alternate Name) 17 Length_Type Max Chars Reqd Data Type Description Validation 2 Character Codes indicating type of length measurement If present, must match a mark-sense length code in Appendix C (Measurement Type) 18 Number_Scales 1 Number Number of scales per fish If present, must be a valid whole number with no punctuation 19 Number_Cards 1 Number Number of gum cards taken for a particular mark-sense form If present, must be a valid whole number with no punctuation 20 Form_Number 15 Character Mark-sense data form sequence number If present, must be digits 21 Species 3 Yes Character Salmon species code Must match a Species Code in Appendix D 22 Stage 1 Yes Character Indicates Juvenile, adult, etc. Must match a stage code in Appendix E 23 Batch_Number 20 Character Region specific – track uploading of data or data Source (Data Link) 24 Stat_week 2 Character Statistical week 25 Period 2 Character Openings (e.g. R3) 26 Comments 50 Character 27 Specimen_ID 4 Character Value generated by the submitter which identifies a particular observation in a sample. Typically, they range 1 through n. 28 Card_Number 3 Character Sequence number of gum card in a particular collection 29 Fish_Number 5 Character 30 Sex 1 Character Male, female, indeterminate If present, must be 1 or 2 characters representing a number between 1 and 54 If present, must match an existing sex code in Appendix F 27 ASL Repository Data Submission Specification Ref Column Name (Alternate Name) Max Chars Reqd Data Type Description Validation 31 Length 4 Number Length of fish in millimeters If present, must be a valid whole number 32 FW_Age 1 Character Freshwater age using European method If present, must be a digit 33 SW_Age 1 Character Saltwater age using European method If present, must be a digit 34 Age_Error_code 10 Character String of one or more digits indicating problems in aging. If present, must be composed of digits 0 through 9 Regional use varies somewhat. Typical values are in Appendix G. 35 Weight 7 Number Weight in grams to the nearest tenth of a gram If present, must be a number in the range 0.1 through 99999.9 36 CWT_Head_Number 6 Character Six digit strap tag number identifying a head collected for the coded wire tag lab If present, must be all digits (Strap Tag, Cinch Strap) 37 Mark_Recapture_Tag 10 Character Up to 10 characters from a tag used in markrecapture programs: disk tag, spaghetti tag, etc. 38 DNA_Number 25 Character Up to 10 characters used to identify DNA specimen collected for genetics lab 8 Character Up to 8 characters used to identify a specimen collected for otolith lab (Silly Code) 39 Otolith_Number (BP Coordination #, Brain Parasite #) 40 User_code 8 Character Region 1 only 41 Image_Name 12 Character File name for digitized scale image 42 Format 6 Character The version of the transfer specification used to build the file. Every row in the file must have this same value. Yes Must be “1.2X” 28 B. Transmission Mechanism The initial transmission mechanism addresses populating the repository with historic data. Once a solid set of experience is obtained in this process, the mechanism for collecting current season data will be addressed in detail. Data Characteristics: 1. All submitted data must be presented in comma-separated value (CSV) files using a Windows-compatible character set, preferably a version of printable ASCII. 2. All files must contain only newline-delimited records. That is, there must be one record per line of the file. 3. Any field whose value contains an embedded comma (,) must be surrounded in double-quotes (“). It is permissible for all fields to be reported with surrounding double-quotes. Any double-quote delimiters will be stripped from the fields as they are stored in the database. 4. No double-quotes are allowed as data values of any data field. The double-quote is sequestered for exclusive use as a field delimiter. 5. The first record in the file must contain Column Names as they are defined in the specification. This serves as inline documentation. The first row of a file will always be skipped when the repository is loaded. But the header will make any submitted file readily identifiable to staff managing the repository, regardless of the file’s name. 6. All fields which do not contain a data value are considered NULL. The fields for which data are absent must be denoted in the file using two consecutive commas (,,). File Scope: 1. For purposes of reporting historic data, a report file shall be all the data for a particular REGION_ID in a particular SAMPLE_YEAR. Those values must be constant in every record of a particular file. 2. Each file submission will be validated according to the mandatory rules in the Specification. Any rows having invalid data will be reported back to the originator. The originator will resubmit the complete file with errors corrected. 3. As soon as a file is determined to meet the validation rules of the specification, it will be loaded into the repository. Before inserting the file contents into the database, all existing data for the region and year will be destroyed. This is necessary to prevent multiple copies of records from accumulating in the repository – records are not required to have unique keys in the repository and cannot be readily deleted or updated on an individual basis. 4. Files may be submitted to the repository by uploading them to ftp://FTP.TAGLAB.ORG. 5. Because there is adequate bandwidth and file space for the repository, submissions should not be compressed. 29 Appendix A – Standard Tix Management Area Codes CODE 9 A B C D E F H K L M O Q R S T W X Y Z DESCRIPTION CANADA JUNEAU/YAKUTAT KETCHIKAN/CRAIG PETERSBURG/WRANGELL SITKA/PELICAN PRINCE WILLIAM SOUND EEZ COOK INLET KODIAK CHIGNIK ALASKA PENINSULA DUTCH HARBOR BERING SEA ADAK/WEST ALEUTIANS SOUTHEAST INSIDE (1990-1998) BRISTOL BAY KUSKOKWIM KOTZEBUE YUKON NORTON SOUND Appendix B – Standard Harvest Codes CODE 11 12 13 14 17 18 21 22 23 24 25 26 27 28 31 33 34 35 36 41 42 43 DESCRIPTION TRADITIONAL TERMINAL AREA EXPERIMENTAL AREA EXPERIMENTAL GEAR M-I-C CONFISCATED PNP FISH PNP CARCASSES STATE FISH STATE CARCASSES FEDERAL FISH FEDERAL CARCASSES PNP DONATED PNP DISCARDED DERBY DISCARDED OILED WASTE EDUCATIONAL COMMERCIAL DONATED TEST RUN ASSESSMENT TEST SPECIAL STUDY TEST STOCK ASSESSMENT 30 Appendix C – Standard Length Type Codes CODE 00 01 02 03 04 05 06 07 08 DESCRIPTION LENGTH NOT TAKEN TIP OF SNOUT TO FORK OF TAIL MID-EYE TO FORK OF TAIL POST ORBIT TO FORK OF TAIL MID-EYE TO HYPURAL PLATE POST ORBIT TO HYPURAL PLATE TIP OF SNOUT TO TIP OF TAIL CLEITHRAL ARCH TO TIP OF TAIL CALCULATED FORK LENGTH Appendix D – Standard Species Codes CODE 410 420 430 440 450 470 540 666 DESCRIPTION CHINOOK SOCKEYE COHO PINK CHUM CUTTHROAT STEELHEAD ATLANTIC Appendix E – Standard Stage Codes CODE A E F G I J P S DESCRIPTION ADULT EMERGENT FRY FED FRY FINGERLING IMMATURE JUVENILE PRESMOLT SMOLT Appendix F – Standard Sex Codes CODE 1 2 3 DESCRIPTION MALE FEMALE EXAMINED BUT DID NOT IDENTIFY 31 Appendix G – Typical Age Error Codes CODE 1 2 3 4 5 6 7 8 DESCRIPTION OTOLITH INVERTED REGENERATED ILLEGIBLE MISSING REABSORBED WRONG SPECIES NOT PREFERRED Appendix H – Typical Gear Codes CODE 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 17 18 19 90 91 DESCRIPTION TRAP PURSE SEINE BEACH SEINE DRIFT GILLNET SET GILLNET HAND TROLL LONG LINE OTTER TRAWL FISHWHEEL POTS SPORT HOOK AND LINE HERRING PURSE SEINE HANDPICKED DIP NET WEIR POWER TROLL BEAM TRAWL SHOVEL WEIR TRAP POTS 32 Appendix I – Typical Project Codes CODE 1 2 3 4 5 6 7 DESCRIPTION COMMERCIAL HARVEST SUBSISTENCE HARVEST ESCAPEMENT (TOWER, WEIR, SONAR SITE, ETC.) ESCAPEMENT – SPAWNING GROUNDS TEST FISHING SPORT CATCH (MARINE) SPORT CATCH (FRESHWATER) 33 Appendix J. 34 This ASL Report was generated on 12/28/2004 11:47:31 AM by the: Alaska Department of Fish and Game Mark, Tag, and Age Laboratory cwt_web@fishgame.state.ak.us (907) 465-4092 10107 Bentwood Place PO BOX 25526 Juneau, AK 99802 Criteria for this report included: Years: 2001, 2000 Species: 450 (Chum Salmon), 420 (Sockeye Salmon) Districts: 325 SubDistricts: All Location: % Year 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2000 2001 2001 2001 2001 2001 2001 2001 2001 2001 2001 Region ManagementArea Species BeginDate EndDate District Subdistrict LocationCode Location 2 BRISTOLBAY 420 6/30/2000 6/30/2000 325 10 401 Igushik Inriver Test 2 BRISTOLBAY 420 7/4/2000 7/4/2000 325 10 301 Igushik Section Set, Ekuk 2 BRISTOLBAY 420 6/28/2000 6/28/2000 325 10 302 Igushik Section, Dillingham 2 BRISTOLBAY 420 6/25/2000 6/25/2000 325 10 101 Igushik Tower 2 BRISTOLBAY 420 7/7/2000 7/7/2000 325 30 401 Nushagak District Test 2 BRISTOLBAY 420 7/1/2000 7/1/2000 325 30 302 Nushagak Section - Drift 2 BRISTOLBAY 420 6/17/2000 6/17/2000 325 30 101 Nushagak Sonar/Tower 2 BRISTOLBAY 420 7/3/2000 7/3/2000 325 30 101 Nuyakuk Tower 2 BRISTOLBAY 420 6/27/2000 6/27/2000 325 40 302 Wood River SHA 2 BRISTOLBAY 420 6/26/2000 6/26/2000 325 30 101 Wood River Tower 2 BRISTOLBAY 450 6/28/2000 6/28/2000 325 30 101 Nushagak Escapement 2 BRISTOLBAY 450 6/30/2000 6/30/2000 325 30 302 Nushagak Section, Dillingham 2 BRISTOLBAY 420 6/22/2001 6/22/2001 325 10 301 Igushik Section 2 BRISTOLBAY 420 6/23/2001 6/23/2001 325 11 302 Igushik Section Set 2 BRISTOLBAY 420 6/21/2001 6/21/2001 325 10 101 Igushik Tower 2 BRISTOLBAY 420 6/24/2001 6/24/2001 325 30 302 Nushagak Section - Drift 2 BRISTOLBAY 420 6/12/2001 6/12/2001 325 30 101 Nushagak Sonar/Tower 2 BRISTOLBAY 420 6/30/2001 6/30/2001 325 30 101 Nuyakuk Tower 2 BRISTOLBAY 420 6/24/2001 6/24/2001 325 30 101 Wood River Tower 2 BRISTOLBAY 450 6/23/2001 6/23/2001 325 10 302 Igushik Section, Dillingham 2 BRISTOLBAY 450 6/28/2001 6/28/2001 325 30 101 Nushagak Escapement 2 BRISTOLBAY 450 6/24/2001 6/24/2001 325 30 302 Nushagak Section, Dillingham Project Gear NumberofCards 5 4 7 1 4 8 1 12 3 3 40 5 2 1 3 67 3 3 49 3 25 1 3 29 3 4 35 3 2 4 1 3 7 1 9 1 3 16 3 2 38 1 3 109 3 2 59 3 2 29 3 3 48 1 3 4 3 3 35 1 3 31 Data 35 Appendix K. 36 This ASL Report was generated on 12/28/2004 1:16:06 PM by the: Alaska Department of Fish and Game Mark, Tag, and Age Laboratory cwt_web@fishgame.state.ak.us (907) 465-4092 10107 Bentwood Place PO BOX 25526 Juneau, AK 99802 Criteria for this report included: Years: 2002 Species: 420 (Sockeye) Regions: All Districts: 101 Harvest: All Gear: All Project: All Stat Week: 1, 54 Species SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE SOCKEYE Year 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 2002 Region 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Area 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 10111 Location 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 Stat Week 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 25 Harvest TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL TRADITIONAL Project 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Gear 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 Age 12 12 13 22 12 22 23 12 22 12 12 13 12 12 23 22 12 13 22 12 12 Sex MALE MALE MALE MALE MALE MALE MALE MALE MALE MALE MALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE FEMALE Length 550 570 632 578 595 660 566 632 658 577 533 536 586 596 556 549 582 550 534 572 531 540 530 Comment 37 Appendix L. 38 39