Data Management Needs and Challenges for Telemetry Scientists Josh M London Wildlife Biologist, Polar Ecosystems Program National Marine Mammal Laboratory NOAA NMFS Alaska Fisheries Science Center Temptation to identify biologists as the source for the raw data The Tip of a Complex Iceberg Publications Contract reports Status/Listing Review derived products movement model data quality control synthesis Data Management Narrowing Bottleneck Many biologists lack the skills and training for effective, scalable database design and data management practices Deployment of tags (location, age/sex, time) tag design/vendor tag programming opportunistic vs. planned hypothesis agency needs/mandates funding initiatives Field Work and Study Design Field Work & Tag Deployment When? Where? Which Tag/Vendor? Which Age? Which Sex? (Do we have a choice?) Tag Programming Deployment Length (attachment type) Limited Tools for Managing Raw Telemetry Data ‘raw’ data via Argos as CSV/Text Process w/ Vendor Software (behavior data) Typically output as CSV Field data about animal (e.g. ID, species, sex, age, health) needs Explore ‘raw’ data Address hypotheses Visualize movement/use Synthesize w/ dependent (e.g. health, age) and independent data (e.g. other animals, remote sensed) Biologists Not Trained in Large Scale Data Management Biologists Excel and/or Access ESRI ArcMap (shapefiles) Google Earth Mouse Click Interaction Programming (visual basic, R, python) recipe driven … not developers Data Manager Postgres/PostGIS, Oracle, MySQL, SQL Server Normalization and Efficient Design Scripting, Jobs, Transactions Data Integrity Automation, Reproducible My Perspective To address complex questions related to marine mammal telemetry and understanding animal ecology, I had to become more of a data manager …And, in the process, I’ve become less of a biologist Start (2006) Current System Argos Monthly CDs SatPack Access Database Excel Files (limited to 56k) Large, Flat Tables No Central Repository Nightly FTP Argos Push Nightly Data Processing CSV/External Oracle Table PL/SQL Procedures Developed/Designed with Training via Google Search My Perspective Current Limitations Data access requires a minimum level of technical skills (basic SQL, Oracle framework, Oracle APEX, R spatial tools, ArcMap) Single Point of Access/Failure (me) Limited Documentation of Design Design May Not be Optimal/Appropriate Main Objective to Provide Data to Analysts – Not necessarily designed for providing data to public My Perspective Greatest Needs – Research Program Data Management and Design Consultation Data Design & Documentation Portal (user-friendly metadata) Low Tech Exploration Tools Database and Application Developers (data flow and data input) Training Opportunities My Perspective Greatest Needs – External to Program? Provide Meaningful Public Access to Data A Clear Data Sharing Policy w/ Best Practices Encourage/Facilitate Scientific Collaboration Meet Agency Needs and Requirements How to Communicate Scientific Knowledge in the Modern/Digital Age–sharing knowledge/expertise just as important as sharing data Publish Data Once My Perspective Challenges / Road Blocks Limited Funds and Priorities – appropriate resources for doing the priority analysis and science not available, let alone the resources to distribute data responsibly Database design/management often in the hands of the least skilled users IT Policies, Investments, and Infrastructure Varied Across Institutions No standard(s) for communicating and sharing ‘raw’ animal telemetry data. What is ‘raw’ data?