Österreichische Tage der Digitalen Geisteswissenschaften save the data - workshop on digital repositories, Dec 2nd 2014 Johannes Spitzbart Phonogrammarchiv, Austrian Academy of Sciences Collecting, preserving and documenting primary audiovisual research sources (unique) and making them available for research use Scholarly research (ethnomusicology, ethnology, linguistics, context & discourse) Technical research & development (standardization, i.e. IASA-TC, preservation, replay & digitization of recordings, patent “Method for Reconditioning of Data Carriers”) and training Role in the digital research infrastructure: Digitization, long-term preservation, data accessibility Metadata capture, comprehensive documentation, database access, searchability (online catalogue) Manually managed system: File Server – RAID 10 (= mirrored) HDD array, 40TB (usable capacity); LTO Tape Library (two backup copies) Eternal preservation through continuous data and format (if needed) migration Checksums for file integrity check Pros: easy files/folder management, in-house maintenance, flexible (independent from proprietary management software; hosting of any file format), probably easier disaster management Con: additional effort for manual file/folder management and linking to database (human resources) Network Switch 1-Gigabit Ethernet Content: Several workstations • Digitized (approx. 50%) tape/disc collection (audio) • Born digital output of supported external & own projects (audio) • Digitized acquired collections (audio) Gigabit Ethernet file servers • Workspace (for temp data) 2 independent, scalable hard drive storage units in storage area network environment (mirrored hard drives, RAID 10) Data backups managed by system administrator Tape library with LTO (=tape data storage) drive MySQL DB, PHP frontend, custom developed (daily backup onto storage system) Elaborate structure due to archival documentation needs (comprehensive metadata capture) Taxonomies & controlled vocabularies (Hornbostel-Sachs, languages, ethnic groups, etc.) Different access levels (visitors read only, admin for taxonomies and CV) AV playback in browser (MP3, MPEG-2) English version (work in progress) Content: Comprehensive documentation (technical, content descriptive and contextual) at item level (=recording) which is a prerequisite for accessibility and potential use … slimmed down copy of in-house database on dedicated “exposed” server Reduced data set & short samples Focus on usability and sophisticated search possibilities Connected to Europeana through Dismarc (weekly updated) Open Access? Legal constraints (Intellectual property rights) Ethical issues (sensitive content) full length online publication not possible to date INPUT (external, main part): Supported research projects get ... ◦ Methodological support/advice ◦ Technical support (recording equipment & training, ) ◦ Preservation of the outcome (data: field recordings, metadata) ◦ Exclusive usage right for six years … and provide on their part ... ◦ The original field recordings ◦ Their description (= predefined set of metadata) for proper documentation OUTPUT: Interested users ... ◦ Browse online catalogue, listen to samples, inquire via email ◦ Are provided with access copies via Download (small handling fees or fixed rates for commercial use, e.g. media, exhibitions)