r0311_final - North Pacific Research Board

advertisement
North Pacific Research Board: Final Report
I. Title: Establishing a statewide data warehouse of salmon size, age and growth records
Principal Investigator(s) and Recipient Organization(s): Beverly Agler, Alaska
Department of Fish and Game, Mark, Tag, and Age Lab, 10107 Bentwood Place,
Juneau, AK 99801 bev_agler@fishgame.state.ak.us
Project #: R0311
Contract Period and Amount of Funding: July 1, 2003-January 31, 2005
II. Abstract:
Collecting biological data is fundamental to managing and monitoring Alaska’s
salmon resources. Throughout Alaska, hundreds of thousands of salmon are examined
annually for sex and size information, and scales are collected for later extraction of age
data. This represents an enormous sampling effort that over 40-plus years has resulted in
collections of data records and scale samples that number in the millions. This
information has been used for local management needs, which usually involves
compiling and tracking data in summary form. As a result, there has been no common
process or protocol for managing and preserving the historical data and scale samples.
The intention of this project was to initiate a process for establishing an electronic data
warehouse environment through which historical salmon sampling and scale pattern data
1
can be maintained and updated annually from collections throughout the state. This
project was designed to inventory all collections and to establish a steering committee
composed of state, federal and research interests to develop common protocols and plan a
strategic approach. The long-term product will be a standardized web-accessible
database where the sampling information will be available to facilitate future research.
III. Purpose:
a. Detailed description of reason for funding and/or problem that was addressed.
The intention of this project was to inventory and to determine the best methodology
to provide access to a valuable and unique collection of salmon specimen data collected
in Alaska by the Alaska Department of Fish and Game (ADFG). Specifically, we
requested funding to inventory all ADFG data collections and to determine the process,
protocols, and funding needed to establish a data repository of salmon size, sex, and age
data for the State of Alaska. The data repository would ultimately incorporate scale
images and the associated growth data as well as make the information available to the
public and researchers via a web-based interface. The repository would maintain copies
of the data that are collected by the ADFG Regional offices and provide secondary
filtering or processing as needed.
While we have referred to this end product as a data warehouse, in the lexicon of
data management it would more accurately be described as a repository or a data mart.
Data warehouses typically contain diverse data types which may cover, under one
2
umbrella, the wide ranging information and data needs of a large institution. They are
complex and difficult entities to build and have not always been successfully deployed.
In contrast, a data mart provides views and reports of data that have common attributes,
such as the case with this sampling data. They require fewer resources for maintaining
the data in a repository and have a less complex data structure and thus a greater
likelihood for successful deployment. Data accessed through a data mart can still be
matched and merged with other databases. The first challenge, however, was to establish
a repository of the data. These samples were collected throughout the state over the past
45 years and the collections were dispersed throughout the state. We also tried to
accommodate changes in collecting protocols that may occur in the future.
The protocols for collecting data from salmon in Alaska were largely in place
before statehood when salmon biologists (primarily with the U.S. Bureau Commercial
Fisheries and the University of Washington) used sampling techniques that were
commonly used along the Pacific Coast. After statehood, many of these practices, such
as storing scales on gum cards in the field, and making impressions of these scales on
acetate cards for examination of age and growth and archiving, were adopted by state
biologists and are still in use today. To keep track of the records, an index number was
used to associate scale cards with datasheets containing sampling information and the
individual size and sex data. After an experienced reader viewed the scale acetate cards
with a microfiche projector, the ages were recorded and tabulated to determine age, sex
and size composition of the sample. Thus, these data are extensive. Scales from all
salmonids species have been collected within Alaska from 1960 to the present.
3
Currently, the number of records exceeds 12 million, and this increase each year. Most
major drainages and stocks of salmon in Alaska have been sampled, and in many
locations, this sampling has been continuous since statehood. These data are primarily
housed in the four regional offices of ADFG, though portions of collections are stored at
the field offices.
In the late 1970’s, the ADFG Division of Commercial Fisheries established a
statewide stock separation program, which developed common protocols for recording
the age, sex, and length data in standardized data fields. These protocols were deployed
throughout the state and provided a level of consistency. This centralized program,
however, was dissolved in the early 1980’s, and project leaders began to modify data
collection methods and the codes used to differentiate items such as gear type, location,
etc. to suit the requirements of an individual project. The advent of microcomputers
enabled the transfer of the raw data from the forms to text files. Some of these files were
saved on 5 1/4” floppy disks; whereas, others were transferred to separate R:Base and
Access databases on desktop PCs. Data from many systems, prior to the 1980’s, are
available only from the original datasheets.
In recent years, a number of research projects have utilized these collections by
extracting growth data from scales as a means to investigate salmon production in both
marine and freshwater environments. Although most of these projects used imageprocessing methods to extract measurements from the scales, the methods were not easily
reproducible among studies. The resulting datasets, which were generally kept on
4
spreadsheets were not compatible with other datasets. Recent projects have addressed
this issue by preserving scales as high-resolution images and documenting the criteria
used to extract measurements. Hagen et al. (2001) extracted growth data and combined
these with the original sampling data in relational databases. Ultimately matching and
merging these scale data with the salmon sampling data and perhaps other products
derived from sampling, such as genetic or otolith stock identification remains a long term
task.
b. Objectives of the project.
There were two major concerns that this project intended to address: loss of
historical sampling data and loss of the opportunity to create an integrated structured data
approach in which the whole will be greater than the sum of the parts. While there was
general recognition that the historical salmon data were valuable, ADFG does not have
the resources to treat the records as true archival documents. Often portions of the data
have been borrowed, and samples were lost or misplaced in the process. In addition
through personnel turnover, there has been a loss of continuity on how and where to
preserve the samples after they were collected and used.
Failure to combine datasets across the state has also been a concern. Although each
region intended to modernize their data collection and management, these efforts have
occurred at different schedules and, in the absence of common protocols, they have
resulted in datasets that are less compatible. The benefits of combining datasets should
5
be obvious: salmon produced in each region share their time in the ocean with each other
as well as with salmon from other countries. By not utilizing all the datasets, we lose an
opportunity to see common patterns and trends that otherwise might improve our
understanding of salmon production and increase our ability to forecast returns.
Although conceived as a multiyear project, this initial year of funding was to
evaluate the project, identify problems and solutions, and investigate options for further
funding. The bulk of these funds was used to conduct a comprehensive inventory of the
state’s salmon sampling data collections. The inventory process forced regional offices
to locate and identify available electronic and hard copy data, noted the quality and
condition of the information, identify common and conflicting data fields, and assess
potential problems in combining the datasets.
The inventory as well as the data standards developed by the Data Standards
Committee has been instrumental in estimating the development and maintenance costs
for a statewide database. The steering committee also identified other funding
opportunities to cover infrastructure costs and assisted in creating partnerships between
state and federal agencies as well as the research community.
Thus, our objectives included:
1. Inventory all scale collections statewide to:
a. Determine presence or absence of gum cards,
b. Determine presence or absence of acetates,
6
c. Determine presence or absence of associated data, and
d. Provide access to the inventory through the Internet.
2. Establish a statewide steering committee composed of state, Federal and nongovernmental researchers to:
a. Develop a common vision of a web-accessible central repository of these
data,
b. Facilitate communication and evaluate policy and priorities for
establishment and maintenance of the data warehouse,
c. Establish data standards to ensure present and future compatibility,
d. Supervise completion of the inventory of sample records, and
e. Continue scoping issues applicable to achieving the long range goal for
creating a central repository of the sampling data.
IV. Approach:
a. Detailed description of the work that was performed, to include (if applicable):
methods or techniques and materials used, design of the study, sample sizes
planned, kind of controls and proposed analysis for the results.
First, we established a steering committee of state, Federal and non-governmental
researchers, and we held our first meeting in Anchorage on September 29, 2003. Sixteen
representatives from the four ADFG regions as well as representatives of other federal
and research interests attended the meeting.
7
At this meeting, we discussed the background and rationale for this grant. Pete
Hagen, National Marine Fisheries Service (NMFS), Auke Bay Lab, outlined some
potential funding sources for creation of a statewide database. Bev Agler, ADFG, Mark,
Tag, and Age Lab (MTA), gave a short presentation describing the retrospective studies
being conducted by the MTA Lab in conjunction with USGS, NMFS and the North
Pacific Marine Research Program. The growth data generated from the digital images of
the scales has been analyzed extensively by Greg Ruggerone, NRC, Inc. Ruggerone
recently published several papers in which he correlated growth of sockeye salmon with
climate change and demonstrated an inversely proportional relationship between Bristol
Bay sockeye salmon growth and the abundance of Asian pink populations (Ruggerone
and Goetz 2004, Ruggerone and Rogers 2003, Ruggerone et al. 2003, 2005).
Phil Mundy, representing the Exxon Valdez Oil Spill Trustee Council and a
member of the NPRB board, spoke about the funding priorities of these organizations.
James Brady, Ecotrust and the Wild Salmon Center, Portland, Oregon, described the
State of the Salmon project, sponsored by the Moore Foundation. They plan to build a
database of salmon population data for California, Oregon, Washington, British
Columbia, Alaska, Russia, and Japan, and to create a web-based iterative system using a
metadata geo-referenced approach to facilitate ease of information exchange.
Bill Johnson, Information Technology (IT) staff for ADFG’s MTA Lab, and
Steve Gebhart, IT staff for ADFG Headquarters, spoke about management of large
8
databases and how ADFG has handled these statewide, centralized repositories. ADFG is
currently creating a division-wide (Commercial Fisheries) metadata system.
Representatives from each of the four ADFG regions (see page 17) outlined the
present status of their data archives. Each region described their current archiving
techniques and data storage. Most data are stored in text files by year and sample site and
are only accessible by opening a file for each year. Thus, to examine 40 years of data, a
researcher must examine 40 different files. The MTA Lab has created Microsoft Access
databases for the few systems they have analyzed for historical scale growth data, but the
databases are incomplete. Where no electronic data existed, usually prior to 1985, only
data for the scales that were examined were included in these databases. Regions 1 and 3
have been compiling files into larger databases and have added some historical data as
well. Regional representatives also discussed present data collection methods and
exchanged information on new techniques to standardize this process.
ASL Records
Region
No. (millions)
Comments
Data Storage
1 – Southeast
3.2
Records increased by 0.5
ORACLE database,
million per year
includes historical data
Represents 35-40 years of
Some Access files, mostly
data. Bristol Bay scales
individual text files
Alaska
2 – PWS,
Cook Inlet,
4
9
Bristol Bay
make up bulk of this at 1-2
million
3 – Yukon
~2-3
Kuskokwim
Estimates, based on
Data were being loaded
summary file data
into Access files as it is
screened from hard copies
and text files
4 – Kodiak
2
Adults, 1985-present,
1985 - present in R:Base.
older data not included.
Older data in banker
Smolts, 1989- present
boxes some stored at
(0.25 million). Sample
seasonal field stations.
50,000-100,000 fish per
year.
Statewide
totals
~11-13
Sample rate ~1-2 million
per year
As shown above, approximately 11-13 million salmon scales and their associated
sampling data have been collected by ADFG since statehood. Efforts to compile these
records electronically started independently in the various regions. However, there has
been no common process or coordination. A primary outcome of the initial meeting was
an agreement to form a Data Standards Committee to develop protocols for field and
table definitions and data exchange.
10
The participants of the first meeting agreed upon a common vision of creating a
web-accessible central repository of the data. There was also recognition that a large
amount of effort is needed to verify and edit the raw data if it is to be retrievable
electronically, and this would have to occur in stages. The establishment of data
standards will help ensure compatibility during this process.
Consequently, we held a second meeting in Anchorage on January 15, 2004. This
meeting was attended by the nine members of the four ADFG Regional Offices and the
MTA Lab, who were appointed to the Data Standards Committee. Discussions at this
meeting focused on two major topics: (1) creation of an inventory of all ADFG age, sex,
length data and (2) development of data standards for a statewide database.
Each region described the status of their electronic inventories with regards to
scale and age, sex, length data.
1. Region 1 (Southeast Alaska) had all data in ORACLE from which they
could create an inventory.
2. Region 2 (Central Region, Prince William Sound, Cook Inlet, Bristol
Bay) was compiling an inventory for Bristol Bay, but other areas, such
as Cook Inlet and Prince William Sound, still needed to be addressed.
3. Region 3 (Arctic, Yukon, Kuskokwim Region) was in the process of
updating their inventory. Their inventory was entirely electronic and
was used as a template for the statewide inventory.
11
4. Region 4 (Kodiak, Chignik, Alaska Peninsula and Aleutian Island
fisheries) had 1985-present data in R:Base, so it was possible to extract
an inventory from those data. It was questionable whether they could
inventory older data, because many of the older records and scale
cards were housed at field offices that were inaccessible at this time of
year. Luckily, this grant provided the impetus and funding to
assemble these records. They are now all stored in Kodiak.
This NPRB grant provided approximately one month of salary for a Fish and Wildlife
Technician in each region to complete the inventories.
The Data Standards Committee chose to follow the format of data standards
adopted by the Pacific Salmon Commission. The committee developed a set of data
standards, which can be viewed on the Internet at:
http://www.taglab.org/ASL/reports/ASL%20spec.doc. These protocols were circulated
among committee members via email and updated at the May and August meetings of the
Steering Committee. Once data transfer protocols (Appendix A) were discussed, these
data standards were changed to a slightly different format viewable on the Internet at:
http://www.taglab.org/ASL/reports/ASL%20specX.doc. The group originally developed a
two-table structure for the database, but this was changed to a single flat table to simplify
data transfer.
The steering committee met again in Anchorage May 18, 2004 to coordinate work
on the statewide inventory of age, sex and length data. The committee discussed the
steps necessary to develop a statewide ASL database.
12
1. Development of the database (ORACLE).
2. Addition of new data to the database.
3. Recovery of the historic electronic data, generally 1985-present.
a. Data would need to be verified and formatted.
4. Recovery and entry of data from non-electronic sources (e.g. data forms).
5. Linking of the ASL database with the digitized scales and their associated growth
and age data housed at the MTA Lab.
The steering committee also developed a list delineating the steps required to
create a statewide database:
1. Development of metadata standards.
2. Development of an ORACLE database with a Microsoft Access user front end
to be used to interface and query the database.
3. Development of Web-accessible site.
4. Additional data verification and data retrieval by MTA Lab.
5. Data recovery of current electronic data by regions.
6. Data recovery of historical records by regions, including data entry.
“Scrubbing” or verification and editing of the data was discussed in some detail.
When the electronic inventory for Region 3 was first added into ORACLE, a number of
rows were lost due to inconsistencies in data entry. Consequently, data will have to be
edited for errors and inconsistencies to prevent loss of important data.
13
The group also discussed where a statewide ASL database could be housed. Most
participants felt that the MTA Lab had the infrastructure, because of the large databases
they currently maintain that are associated with the Coded Wire Tag and the Thermal
Mark Laboratories. Bill Johnson stated that the MTA Lab would need some additional
hardware to house another large database, plus the age, sex, length database would be
much larger that the Coded Wire Tag database.
Although not specifically part of the grant, these meetings brought together
ADFG regional staff that are usually separated by distance and allowed us to discuss
topics related to these age, sex, length data. For example, John Wilcock demonstrated
some data collection methodologies that he has been developing to use WinCE and
PalmOS applications to enter data directly into a database. Electronic data collection
would eliminate the need for the computer forms currently in use, which have had
problems with inconsistencies if not filled out correctly. This project would ultimately
benefit all regions. Feasibility testing for this new data entry system was scheduled
for Ketchikan during summer 2004, with major testing of other types of data collection in
summer 2005.
To coordinate work on a new proposal and the final report for this grant, the
steering committee conducted its final meeting in Juneau on 9 August 2004. First, we
discussed the status of the regional inventories. All regions had submitted at least partial
inventories. Bill Johnson demonstrated the web-accessible inventory (Appendix B)
http://www.taglab.org/ASL/reports/default.asp.
Reports can be run either to the screen or a comma separated (CSV) file, which can then
be opened in a program of the users’ preference, such as in Microsoft Excel. A Microsoft
14
Access version of the ASL inventory is also available by accessing the MTA Lab’s ftp
server at ftp://ftp.taglab.org/.
Johnson also demonstrated the Test ASL database, located on the Internet at the
same URL and listed as ASL Specimen Report. The ASL Specimen Report provides
access to a “demonstration only” ORACLE database with three years of age, sex, length
data from Region 1. It was developed to demonstrate the potential accomplishments as
this project progresses.
The third report available at this website is the ASL Availability Report. This
report was developed to allow regions and MTA Lab staff to track submissions to the
ASL database. This report advises regional personnel on the accessibility of the data. It
was suggested that data be transferred to the MTA Lab once a year. The data would then
be added to the ORACLE database. When this occurred, it would appear on the ASL
Availability form.
We finished the meeting by discussing how to refine the overall goals of the
project. The committee discussed whether this should be a database where data were
entered directly from the field and then used by managers for direct management needs,
or whether this should just be a central repository archiving the historical records for the
state of Alaska? Most regional participants felt that the database would not be responsive
enough for the former alternative. Regions would still be required to coordinate and
house smaller regional databases, then data would be added to the statewide database by
year and region, starting with the current data and working back through the historical
data. Consequently, regions would still be responsible for their own management reports
and queries needed during field season.
15
b.
Project Management: List individuals and/or organizations actually performing the
work and how it was done.
1) Bev Agler, Fishery Biologist III, served as the Principal Investigator. She coordinates
the scale pattern research conducted by the ADFG’s MTA Lab and directs the efforts to
create a comprehensive inventory of the state’s salmon sampling records. She organized
the meetings and drafted the minutes for each meeting among the members of the
steering committee. Each region prepared an inventory, which was then submitted to the
MTA Lab. Agler supervised completion of the final inventory in Microsoft Access and
worked closely with Bill Johnson (see below) to develop the web-accessible version.
2) Bill Johnson, Analyst Programmer IV for ADFG’s MTA Lab, provided technical
expertise on administration and construction of a data warehouse. He has over 20 years
of experience in IT development and has worked with diverse fisheries datasets. He
converted the Microsoft Access inventory to ORACLE and developed the web-accessible
interface that allows users to access and search the data through the Internet.
3) A steering committee was established August 2003. It was composed of members
representing the regional offices of ADFG.
a. Region 1 (Southeast Alaska)
i. John Wilcock, Fishery Biologist III, who is currently coordinating
a project to integrate ASL data for Region 1, and
ii. Scott Johnson, Analyst Programmer IV.
16
b. Region 2 (Central Region- Prince William Sound, Cook Inlet, and Bristol
Bay)
i. Tim Baker, Research Analyst III,
ii. Lowell Fair, Area Research Biologist, and
iii. Fred West, Assistant Area Research Biologist.
c. Region 3 (Arctic-Yukon-Kuskokwim Region)
i. Linda Brannian, Fishery Biologist IV,
ii. Seth Darr, Analyst Programmer IV, and
iii. Holly Moore, Analyst Programmer III.
d. Region 4 (Kodiak, Chignik, Alaska Peninsula and Aleutian Islands)
i. Patty Nelson, Regional Finfish Research Supervisor, and
ii. Matt Foster, Kodiak Finfish Research Biologist.
e. Other members
i. Dr. Peter Hagen, NOAA-NMFS, Auke Bay Laboratory, facilitated
the creation of the steering committee and incorporated a research
partnership perspective,
ii. Other personnel attended the meetings depending on location, such
as Dion Oxman and Bill Rosky from the MTA Lab when the
meeting was held in Juneau
iii. Some participants were invited to lend their perspectives to issues
such as potential funding and management of large data
warehouses.
1. Dr. Phil Mundy, EVOS Trustee Council
17
2. Tom Jarvis, Analyst Programmer, ADFG
3. James Brady, Ecotrust, Wild Salmon Center
4. Steve Gebert, ADFG Headquarters, IT Staff
5. Rob Bochenek, EVOS Trustee Council, GEM Data
Systems Manager
6. Ron Josephson, Alaska Mark and Tag Coordinator, MTA
Lab
V.
Findings:
a. Actual accomplishments and findings.
The Data Standards Committee developed the field and table definitions for the
data warehouse and suggested protocols for data exchange. These data standards
(Appendix A) can be accessed via the Internet at:
http://www.taglab.org/ASL/reports/ASL%20spec.doc.
As part of this process, the MTA Lab compiled several Internet-accessible reports
to demonstrate how data could be compiled and accessed more easily through a statewide
program. Several reports are available by going to http://www.taglab.org/ASL/reports/.
The first report is a comprehensive inventory of scale specimens (Appendix J). Selecting
“ASL Inventory Report” will take the user to a screen where you can query an ORACLE
database containing inventories of most of the scales collected by the four regions. This
screen allows the user to select year, species, district, sub-district and location. The
underlying database contains ~18,000 rows of data. Once these items have been selected,
18
the report can either be displayed or sent to a file that can be opened with Microsoft
Excel.
For demonstration purposes only, we developed an “Age, Sex, Length Specimen
Report” that shows some possible uses of these data (Appendix K). Currently, the
database only contains three years of ASL data (1982, 1992, and 2002) from Southeast
Alaska. Again, various criteria are available to allow the user to specify boundaries on
the query and thus limit the amount of data returned by the database.
The third report (Appendix L), the “ASL Availability Report,” was developed to
assist regional personnel to follow the data and determine which files have been
submitted to the data warehouse, which files have been validated and which files are now
accessible through the Internet database.
We also started developing Data Transfer Protocols. These guidelines can be
accessed via the Internet at: http://www.taglab.org/ASL/reports/ASL%20specX.doc.
While research interests were originally to serve in partnership on the steering
committee, it was apparent after the first meeting that much of the initial work required
the input by ADFG staff familiar with data. It is anticipated that research perspectives
will be brought back in during subsequent phases to help establish priorities in
developing reports and delivery products.
19
b. If significant problems developed which resulted in less than satisfactory or
negative results, they should be addressed.
We developed no significant problems. A couple of small collections were not
inventoried within the framework of the grant, but they will be inventoried at the earliest
opportunity, and the data will be added to the inventory.
c. Describe needs of additional work, if results suggest further study.
Given the extensive geographic and temporal range represented by this collection,
compiling these data in an accessible format will augment future research and monitoring
programs. The size and complexity of this endeavor; however, require that it be
implemented in phases. Phase 1, the focus of this report, has been completed. In Phase
2, the proposed work will involve populating this database with all available electronic
records, developing reports and procedures for annually updating the database from the
various collection sites, and addressing the recovery and data entry of records available
only as hard copies on historic data sheets. Effort will also be directed towards planning
the final phase of the project, which would be to complete historical data entry of key
regional systems and to join this database to derivative products collected from other
funding sources, such as the growth histories of salmon extracted from the scales via
image analysis, which are now currently housed in separate databases at the MTA Lab.
Scale growth data in particular – which in its simplest form may contain up to 120
measurements per specimen – provides an enormously detailed means of tracking
20
changes in marine conditions that support salmon production and has proven useful in
formulating predictive models. Capturing and preserving these records in a statewide
database would be valuable contribution to long-term monitoring and predictive
modeling programs.
VI. Evaluation:
a. Describe the extent to which the project goals and objectives were attained. This
description should address the following:
1. Were the goals and objective obtained?
2. Were modifications made to the goals and objectives? If so, explain.
As stated in section IIIb, our major objectives were to inventory all commercial
salmonid scale collections statewide to determine presence or absence of scales and the
associated data and to provide access to this inventory through the Internet. We also
planned to establish a steering committee of state, Federal and non-governmental
researchers to develop a common vision of a web-accessible central repository of these
data, facilitate communication, and evaluate policy and priorities for establishment and
maintenance of the data warehouse. We established data standards to ensure
compatibility during this process and in the future, supervised completion of an inventory
of sample records, and continued scoping issues applicable to achieving the long range
goal of creating a central repository of the sampling data. We were able to achieve these
goals and objectives and have developed a proposal to carry this project into the second
phase, converting the data into a data warehouse or data mart.
21
b. Dissemination of project results:
Explain, in detail, how the project results have been, and will be, disseminated.
To disseminate the project results, we developed several web-accessible forms.
For example, the data standards (Appendix A-I) developed by the Steering Committee
can be accessed via the Internet at: http://www.taglab.org/ASL/reports/ASL%20spec.doc.
Several reports (Appendix J-L) are accessible at: http://www.taglab.org/ASL/reports/.
The first report is a comprehensive inventory of scale specimens (Appendix J). Selecting
“ASL Inventory Report” will take the user to a screen where you can query an ORACLE
database containing inventories of most of the scales collected by the four regions. This
screen allows the user to select year, species, district, sub-district and location. The
underlying database contains ~18,000 rows of data. Once these items have been selected,
the report can either be displayed or sent to a file that can be opened with Microsoft
Excel. Please see Section Va. “Actual Accomplishments and Findings” for further details
for accessing these reports.
References
Hagen, P.T., D.S. Oxman, and B.A. Agler. 2001. Developing and deploying a high resolution
imaging approach for scale analysis. (NPAFC Doc. 567). 11 p. Mark, Tag, and Age
Lab, Alaska Department of Fish and Game, Juneau, Alaska.
Ruggerone, G. T. and Rogers, D. E. 2003. Multi-year effects on high densities of
sockeye salmon spawners on juvenile salmon growth and survival: a case
study from the Exxon Valdez oil spill. Fish. Res. 63: 379-392.
Ruggerone, G. T., Zimmermann, M., Myers, K. W., Nielsen, J. L., and Rogers, D. E.
2003. Competition between Asian pink salmon (Oncorhynchus gorbuscha)
22
and Alaskan sockeye salmon (O. nerka) in the North Pacific Ocean. Fish.
Oceanogr. 12: 209-219.
Ruggerone, G. T. and Goetz, F. 2004. Survival of Puget Sound chinook salmon
(Oncorhynchus tshawytscha) in response to climate-induced competition with
pink salmon (Oncorhynchus gorbuscha). Can. J. Fish. Aquat. Sci. 61: 17561770.
Ruggerone, G. T., Farley, E., Nielsen, J., and Hagen, P. 2005. Seasonal marine growth
of Bristol Bay sockeye salmon (Oncorhynchus nerka) in relation to
competition with Asian pink salmon (O. gorbuscha) and the 1977 ocean
regime shift. Fish. Bull. 103:355–370 (2005).
23
Appendices
Appendix A-I. Data Transfer Protocols developed in conjunction with the Data
Standards Committee to simplify transfer of data from regional data collections to the
statewide data warehouse.
Appendix J. Example of the ASL Inventory Form available on the internet at:
http://www.taglab.org/ASL/reports/inventory.asp. Queries of this database can be limited
by choosing years, species, districts, sub-districts, and location or some combination of
these items. The second page shows the results of a query that we ran where we selected
two years of data (2000 and 2001) for chum and sockeye salmon from District 325
(Nushagak area).
Appendix K. Example of the ASL Specimen Report available on the Internet at:
http://www.taglab.org/ASL/reports/sample.asp. This is also called the Age Sex Length
Sample Form. This demonstration database contains only three years of data from
Southeast Alaska. It was developed to demonstrate queries that could be created once the
data warehouse was in operation. The second page shows the results of a query that we
ran where we selected sockeye salmon from the year 2002 from district 101.
Appendix L. Example of the ASL Availability Report available on the Internet at:
http://www.taglab.org/ASL/reports/availability.asp. This report allows regional users of
the data warehouse to track data that have been submitted to see if data have been
received and validated.
24
Appendix A
Salmon Age-Sex-Length Data Transfer Protocol
Variations of format, coding, and definitions have been used among regions in collecting Alaska’s detailed salmon Age-Sex-Length data. Differences have
compounded over time, as staffing and program requirements changed. This protocol is defined to facilitate the transmission of Alaska’s historic and future
salmon Age-Sex-Length biological data into a central repository. It provides a clear specification that explains the nature of data to the users, and a well-defined
transmission mechanism for populating the repository.
A. Data Submission Specification
Max Chars refers to the maximum number of characters a field’s value may contain. Reqd indicates whether the field must contain a value (nulls ARE NOT
accepted when Reqd=Yes). Data Type references how the column should be defined in a relational database. Validation gives specific rules that the submission
must fully meet in order to be accepted into the repository.
ASL Repository Data Submission Specification
Ref
Column Name
(Alternate Name)
Max
Chars
1
Sample_ID
23
2
Region_ID
1
3
Sample_Year
4
4
Management_area
5
Tix_management_area
Reqd
Data Type
Description
Validation
Character
Value generated by the reporting region,
which may be used to associate specific
records in the original regional data as
belonging to a particular sampling event.
Uniqueness is desired, but not mandatory.
Yes
Character
Commercial Fisheries region of collection
Must be ‘1’, ‘2’, ‘3’ or ‘4’
Yes
Character
4-digit year in which sampling event occurred
Must be between 1930 and the current
calendar year
3
Character
Geographic area spanning a number of
districts. Typically defined at the region level
for their areas of interest.
1
Character
Management Area code formally defined by
the fish ticket system.
If present, must match an existing code
in Appendix A
25
ASL Repository Data Submission Specification
Ref
Column Name
(Alternate Name)
Max
Chars
Reqd
Data Type
Description
Validation
6
District
3
Character
Three character district of observation
If present, must match a current or
historic CF fish ticket district
7
Subdistrict
3
Character
ID for subset of district sampled, if any
If present, must be exactly two digits
8
Stream
50
Character
ID for anadromous waters catalog stream
sampled
R 3 always 0
9
Location
3
Character
R 1: port codes
R 2: stream location
R 3: stream location, some fish tickets
R 4: some stream locations
10
Project
2
Character
“Fishery Type” Legacy code. Typical values
are in Appendix I
11
Sample_Day
2
Character
Day this sample was taken, or began to be
taken.
If present, must be 1 or 2 digits in range
1 through 31
12
Sample_Month
2
Character
Month number when this sample was started
If present, must be 1 or 2 digits in range
1 through 12
13
Sample_Date
10
Date
Single date sample was taken
If present, must be a valid date.
mm/dd/yyyy
14
Gear
2
Character
Type of collection gear. Typical values are in
Appendix H.
15
Harvest_Code
2
Character
Type of commercial fishery sampled
16
Mesh
5
Character
Net mesh size
If present, must match a current or
historic Harvest Code in Appendix B
Converted to inches
None for R1 and R4
26
ASL Repository Data Submission Specification
Ref
Column Name
(Alternate Name)
17
Length_Type
Max
Chars
Reqd
Data Type
Description
Validation
2
Character
Codes indicating type of length measurement
If present, must match a mark-sense
length code in Appendix C
(Measurement Type)
18
Number_Scales
1
Number
Number of scales per fish
If present, must be a valid whole number
with no punctuation
19
Number_Cards
1
Number
Number of gum cards taken for a particular
mark-sense form
If present, must be a valid whole number
with no punctuation
20
Form_Number
15
Character
Mark-sense data form sequence number
If present, must be digits
21
Species
3
Yes
Character
Salmon species code
Must match a Species Code in Appendix
D
22
Stage
1
Yes
Character
Indicates Juvenile, adult, etc.
Must match a stage code in Appendix E
23
Batch_Number
20
Character
Region specific – track uploading of data or
data Source
(Data Link)
24
Stat_week
2
Character
Statistical week
25
Period
2
Character
Openings (e.g. R3)
26
Comments
50
Character
27
Specimen_ID
4
Character
Value generated by the submitter which
identifies a particular observation in a sample.
Typically, they range 1 through n.
28
Card_Number
3
Character
Sequence number of gum card in a particular
collection
29
Fish_Number
5
Character
30
Sex
1
Character
Male, female, indeterminate
If present, must be 1 or 2 characters
representing a number between 1 and 54
If present, must match an existing sex
code in Appendix F
27
ASL Repository Data Submission Specification
Ref
Column Name
(Alternate Name)
Max
Chars
Reqd
Data Type
Description
Validation
31
Length
4
Number
Length of fish in millimeters
If present, must be a valid whole number
32
FW_Age
1
Character
Freshwater age using European method
If present, must be a digit
33
SW_Age
1
Character
Saltwater age using European method
If present, must be a digit
34
Age_Error_code
10
Character
String of one or more digits indicating
problems in aging.
If present, must be composed of digits 0
through 9
Regional use varies somewhat. Typical
values are in Appendix G.
35
Weight
7
Number
Weight in grams to the nearest tenth of a gram
If present, must be a number in the range
0.1 through 99999.9
36
CWT_Head_Number
6
Character
Six digit strap tag number identifying a head
collected for the coded wire tag lab
If present, must be all digits
(Strap Tag, Cinch Strap)
37
Mark_Recapture_Tag
10
Character
Up to 10 characters from a tag used in markrecapture programs: disk tag, spaghetti tag,
etc.
38
DNA_Number
25
Character
Up to 10 characters used to identify DNA
specimen collected for genetics lab
8
Character
Up to 8 characters used to identify a specimen
collected for otolith lab
(Silly Code)
39
Otolith_Number
(BP Coordination #,
Brain Parasite #)
40
User_code
8
Character
Region 1 only
41
Image_Name
12
Character
File name for digitized scale image
42
Format
6
Character
The version of the transfer specification used
to build the file. Every row in the file must
have this same value.
Yes
Must be “1.2X”
28
B. Transmission Mechanism
The initial transmission mechanism addresses populating the repository with historic data. Once
a solid set of experience is obtained in this process, the mechanism for collecting current season
data will be addressed in detail.
Data Characteristics:
1.
All submitted data must be presented in comma-separated value (CSV) files using a
Windows-compatible character set, preferably a version of printable ASCII.
2.
All files must contain only newline-delimited records. That is, there must be one
record per line of the file.
3.
Any field whose value contains an embedded comma (,) must be surrounded in
double-quotes (“). It is permissible for all fields to be reported with surrounding
double-quotes. Any double-quote delimiters will be stripped from the fields as they
are stored in the database.
4.
No double-quotes are allowed as data values of any data field. The double-quote is
sequestered for exclusive use as a field delimiter.
5.
The first record in the file must contain Column Names as they are defined in the
specification. This serves as inline documentation. The first row of a file will always
be skipped when the repository is loaded. But the header will make any submitted
file readily identifiable to staff managing the repository, regardless of the file’s name.
6.
All fields which do not contain a data value are considered NULL. The fields for
which data are absent must be denoted in the file using two consecutive commas (,,).
File Scope:
1.
For purposes of reporting historic data, a report file shall be all the data for a
particular REGION_ID in a particular SAMPLE_YEAR. Those values must be
constant in every record of a particular file.
2.
Each file submission will be validated according to the mandatory rules in the
Specification. Any rows having invalid data will be reported back to the originator.
The originator will resubmit the complete file with errors corrected.
3.
As soon as a file is determined to meet the validation rules of the specification, it will
be loaded into the repository. Before inserting the file contents into the database, all
existing data for the region and year will be destroyed. This is necessary to prevent
multiple copies of records from accumulating in the repository – records are not
required to have unique keys in the repository and cannot be readily deleted or
updated on an individual basis.
4.
Files may be submitted to the repository by uploading them to
ftp://FTP.TAGLAB.ORG.
5.
Because there is adequate bandwidth and file space for the repository, submissions
should not be compressed.
29
Appendix A – Standard Tix Management Area Codes
CODE
9
A
B
C
D
E
F
H
K
L
M
O
Q
R
S
T
W
X
Y
Z
DESCRIPTION
CANADA
JUNEAU/YAKUTAT
KETCHIKAN/CRAIG
PETERSBURG/WRANGELL
SITKA/PELICAN
PRINCE WILLIAM SOUND
EEZ
COOK INLET
KODIAK
CHIGNIK
ALASKA PENINSULA
DUTCH HARBOR
BERING SEA
ADAK/WEST ALEUTIANS
SOUTHEAST INSIDE (1990-1998)
BRISTOL BAY
KUSKOKWIM
KOTZEBUE
YUKON
NORTON SOUND
Appendix B – Standard Harvest Codes
CODE
11
12
13
14
17
18
21
22
23
24
25
26
27
28
31
33
34
35
36
41
42
43
DESCRIPTION
TRADITIONAL
TERMINAL AREA
EXPERIMENTAL AREA
EXPERIMENTAL GEAR
M-I-C
CONFISCATED
PNP FISH
PNP CARCASSES
STATE FISH
STATE CARCASSES
FEDERAL FISH
FEDERAL CARCASSES
PNP DONATED
PNP DISCARDED
DERBY
DISCARDED
OILED WASTE
EDUCATIONAL
COMMERCIAL DONATED
TEST RUN ASSESSMENT
TEST SPECIAL STUDY
TEST STOCK ASSESSMENT
30
Appendix C – Standard Length Type Codes
CODE
00
01
02
03
04
05
06
07
08
DESCRIPTION
LENGTH NOT TAKEN
TIP OF SNOUT TO FORK OF TAIL
MID-EYE TO FORK OF TAIL
POST ORBIT TO FORK OF TAIL
MID-EYE TO HYPURAL PLATE
POST ORBIT TO HYPURAL PLATE
TIP OF SNOUT TO TIP OF TAIL
CLEITHRAL ARCH TO TIP OF TAIL
CALCULATED FORK LENGTH
Appendix D – Standard Species Codes
CODE
410
420
430
440
450
470
540
666
DESCRIPTION
CHINOOK
SOCKEYE
COHO
PINK
CHUM
CUTTHROAT
STEELHEAD
ATLANTIC
Appendix E – Standard Stage Codes
CODE
A
E
F
G
I
J
P
S
DESCRIPTION
ADULT
EMERGENT FRY
FED FRY
FINGERLING
IMMATURE
JUVENILE
PRESMOLT
SMOLT
Appendix F – Standard Sex Codes
CODE
1
2
3
DESCRIPTION
MALE
FEMALE
EXAMINED BUT DID NOT IDENTIFY
31
Appendix G – Typical Age Error Codes
CODE
1
2
3
4
5
6
7
8
DESCRIPTION
OTOLITH
INVERTED
REGENERATED
ILLEGIBLE
MISSING
REABSORBED
WRONG SPECIES
NOT PREFERRED
Appendix H – Typical Gear Codes
CODE
00
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
17
18
19
90
91
DESCRIPTION
TRAP
PURSE SEINE
BEACH SEINE
DRIFT GILLNET
SET GILLNET
HAND TROLL
LONG LINE
OTTER TRAWL
FISHWHEEL
POTS
SPORT HOOK AND LINE
HERRING PURSE SEINE
HANDPICKED
DIP NET
WEIR
POWER TROLL
BEAM TRAWL
SHOVEL
WEIR
TRAP
POTS
32
Appendix I – Typical Project Codes
CODE
1
2
3
4
5
6
7
DESCRIPTION
COMMERCIAL HARVEST
SUBSISTENCE HARVEST
ESCAPEMENT (TOWER, WEIR, SONAR SITE, ETC.)
ESCAPEMENT – SPAWNING GROUNDS
TEST FISHING
SPORT CATCH (MARINE)
SPORT CATCH (FRESHWATER)
33
Appendix J.
34
This ASL Report was generated on 12/28/2004 11:47:31 AM by the: Alaska Department of Fish and Game
Mark, Tag, and Age Laboratory
cwt_web@fishgame.state.ak.us
(907) 465-4092
10107 Bentwood Place
PO BOX 25526
Juneau, AK 99802
Criteria for this report included:
Years: 2001, 2000
Species: 450 (Chum Salmon), 420 (Sockeye Salmon)
Districts: 325
SubDistricts: All
Location: %
Year
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
Region ManagementArea Species BeginDate EndDate District Subdistrict LocationCode
Location
2
BRISTOLBAY
420
6/30/2000 6/30/2000
325
10
401 Igushik Inriver Test
2
BRISTOLBAY
420
7/4/2000 7/4/2000
325
10
301 Igushik Section Set, Ekuk
2
BRISTOLBAY
420
6/28/2000 6/28/2000
325
10
302 Igushik Section, Dillingham
2
BRISTOLBAY
420
6/25/2000 6/25/2000
325
10
101 Igushik Tower
2
BRISTOLBAY
420
7/7/2000 7/7/2000
325
30
401 Nushagak District Test
2
BRISTOLBAY
420
7/1/2000 7/1/2000
325
30
302 Nushagak Section - Drift
2
BRISTOLBAY
420
6/17/2000 6/17/2000
325
30
101 Nushagak Sonar/Tower
2
BRISTOLBAY
420
7/3/2000 7/3/2000
325
30
101 Nuyakuk Tower
2
BRISTOLBAY
420
6/27/2000 6/27/2000
325
40
302 Wood River SHA
2
BRISTOLBAY
420
6/26/2000 6/26/2000
325
30
101 Wood River Tower
2
BRISTOLBAY
450
6/28/2000 6/28/2000
325
30
101 Nushagak Escapement
2
BRISTOLBAY
450
6/30/2000 6/30/2000
325
30
302 Nushagak Section, Dillingham
2
BRISTOLBAY
420
6/22/2001 6/22/2001
325
10
301 Igushik Section
2
BRISTOLBAY
420
6/23/2001 6/23/2001
325
11
302 Igushik Section Set
2
BRISTOLBAY
420
6/21/2001 6/21/2001
325
10
101 Igushik Tower
2
BRISTOLBAY
420
6/24/2001 6/24/2001
325
30
302 Nushagak Section - Drift
2
BRISTOLBAY
420
6/12/2001 6/12/2001
325
30
101 Nushagak Sonar/Tower
2
BRISTOLBAY
420
6/30/2001 6/30/2001
325
30
101 Nuyakuk Tower
2
BRISTOLBAY
420
6/24/2001 6/24/2001
325
30
101 Wood River Tower
2
BRISTOLBAY
450
6/23/2001 6/23/2001
325
10
302 Igushik Section, Dillingham
2
BRISTOLBAY
450
6/28/2001 6/28/2001
325
30
101 Nushagak Escapement
2
BRISTOLBAY
450
6/24/2001 6/24/2001
325
30
302 Nushagak Section, Dillingham
Project Gear NumberofCards
5
4
7
1
4
8
1
12
3
3
40
5
2
1
3
67
3
3
49
3
25
1
3
29
3
4
35
3
2
4
1
3
7
1
9
1
3
16
3
2
38
1
3
109
3
2
59
3
2
29
3
3
48
1
3
4
3
3
35
1
3
31
Data
35
Appendix K.
36
This ASL Report was generated on 12/28/2004 1:16:06 PM by the: Alaska Department of Fish and Game
Mark, Tag, and Age Laboratory
cwt_web@fishgame.state.ak.us
(907) 465-4092
10107 Bentwood Place
PO BOX 25526
Juneau, AK 99802
Criteria for this report included:
Years: 2002
Species: 420 (Sockeye)
Regions: All
Districts: 101
Harvest: All
Gear: All
Project: All
Stat Week: 1, 54
Species
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
SOCKEYE
Year
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
2002
Region
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Area
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
10111
Location
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
Stat Week
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
25
Harvest
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
TRADITIONAL
Project
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Gear
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
Age
12
12
13
22
12
22
23
12
22
12
12
13
12
12
23
22
12
13
22
12
12
Sex
MALE
MALE
MALE
MALE
MALE
MALE
MALE
MALE
MALE
MALE
MALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
FEMALE
Length
550
570
632
578
595
660
566
632
658
577
533
536
586
596
556
549
582
550
534
572
531
540
530
Comment
37
Appendix L.
38
39
Download