Case studies of practical data management Ben Kreunen Technical Support Officer University Digitisation Service Putting Metadata to Work Ben Kreunen Technical Support Officer University Digitisation Service Putting metadata to work What? • Print collection: – Image management tool created as a spin off from the data used to scan • Thesis on demand – Incorporating administrative data into the scanning process to improve business processes • Asset stocktake (pilot) – From THEMIS to iPhone and back • Creating a contact list from an org chart (concept) – Linking business processes and data Putting metadata to work Why? • Time is money – Save 10 seconds on a task performed 2,500 times and you save 1 working day • Doing repetitive tasks sucks • Doing repetitive tasks again sucks • Reducing mental fatigue Putting metadata to work How? • Reduce the time it takes to do stuff – Automatically enter related data – Collect data that’s been entered somewhere else – Select from lists rather than type – Re-use data that’s been entered before – Script repetitive processes • Simplify interface design – Only show the data you need at the time – Visual feedback Putting metadata to work Who? • People who manage the process • People who DO the process • People with technical skills • Working together! Putting metadata to work Simple tools, clever connections http://www.philohome.com/panobot2/panobot2.htm Putting metadata to work Metadata: • Data about data Putting metadata to work How do we manage Metadata? • Data about data Putting metadata to work How do we manage data? • Data about data –Excel? –Access? –Database? Putting metadata to work How do we manage relational data? • Data about data –Excel? –Access? –Relational database? Putting metadata to work How do we manage relational data efficiently? • Data about data –Excel? –Access? –Relational database? Putting metadata to work What is data management? • Making lists of stuff • Finding things in lists of stuff • Sharing lists of stuff • Editing lists of stuff • Combining lists of stuff • etc…. Re-using data Open source tools Usability Print Collection Putting metadata to work Data Requirement • There must be only one ID number to link each image to a catalogue record Putting metadata to work Before • ~3,500 images on 4 external HDDs without an index (2Tb) • File names based on a partial accession number • Online images served via KE EMU • Duplicate accession number exist • Number of duplicate IDs not known • ~4,500 prints to be scanned Putting metadata to work Preparation for scanning • Prepare data – Export data from EMU – Create separate database to analyse/prepare data – Locate duplicate records (12) – List existing images and calculate ID numbers – Locate invalid file names (7) – Copy master files to network storage (1Tb) Putting metadata to work Scanning requirements • Previous images scanned with colour chart • Getty “standard” ie. multiple versions at different sizes • All versions of images have colour chart (archive master has colour chart, other versions should be cropped) • Archive master files = 50% of total file size Putting metadata to work Planning • What is the best way to capture a master image and cropped version? • Should a cropped version be created of the existing images? Putting metadata to work Planning • What is the best way to capture a master image and cropped version? • Should a cropped version be created of the existing images? • Do we need to create a cropped version? – Saves time digitising – Reduces storage costs ~40% Putting metadata to work Planning • What is required to crop images on demand? • Is it possible? • Can a standard computer do it? • What data do we need? • How do we collect it? • How are the images used? • How are requests processed? Putting metadata to work Planning • What is required to crop images on demand? • Is it possible? Mini Project • ImageMagick + coordinates + batch file = automated cropping on demand • Hack techniques to collect data • Raised awareness of other possible uses Putting metadata to work Acquiring coordinates for cropping Putting metadata to work Scanning issues • ID numbers of prints delivered is random – Locating 1 ID number in a list of 8,000… Putting metadata to work What broke • Not all ID numbers are unique – modification of naming schema required • “Modified” scanning procedure to deal with annoyances was prone to the occasional error – error, cause and solution identified by scanner operator • Image from previous project did not match ID number Putting metadata to work Helping others • A small step to change our project work into a tool to improve management of image collection – Crop, resize and format images on demand – Fast response to deal with requests for images – Images more secure – Images accessed using familiar identifiers Putting metadata to work Helping others • Runtime version of database to be given to collection manager • Total software cost: $0 Putting metadata to work Helping others Putting metadata to work Helping others Putting metadata to work Helping others Can we browse the images scanned to date? Putting metadata to work Helping others That’s great... Can you do the same thing for everything else you’ve scanned? (currently 250,000 files) Automating administrative processes Sharing administrative data Minimising data entry Thesis on demand service Putting metadata to work About the service • Copy of a thesis is requested by a researcher/ academic library for research purposes • Thesis is scanned (for a fee) and delivered to client – Print – CD – Cloudstor • Recently relocated from the Baillieu to UDS Putting metadata to work Challenges • Incorporating administrative data and processes • Multiple time frames depending on delivery • Variable timing for delivery of theses – accessed locally or from offsite archives • Process is now split across 2 departments Putting metadata to work The Request Putting metadata to work Data entry • Thesis details – Scan barcode – Automatic collection of required and optional metadata • Delivery method – check box – Email address if Cloudstor • Date request received • Urgency – check box Putting metadata to work Putting metadata to work Re-using data • Date item is to be scanned by calculated from: – Date received – Delivery method – Urgency • Work list sorted by “completion status” and “date due” • Output filenames automatically generated from metadata (author, year) Putting metadata to work Re-using data • File delivery is automated as much as possible: – Copy and rename file to pickup folder – Generate email message to notify Special Collections and Repository team – Load Cloudstor interface if selected as the delivery method • Entries for each form field generated and copied to • the clipboard Upload form completed with 8 mouse clicks Putting metadata to work Re-using data Putting metadata to work Re-using data Putting metadata to work What broke • Client queries could not be answered immediately because of the split – no direct access to our data – daily export of a PDF report enables most queries to be dealt with • Not all theses have barcodes • Not all theses are catalogued Putting metadata to work What broke Putting metadata to work Outcomes • Improved client communication • Improved communication between departments • Reduced data entry • Improved quality of metadata • Simplified reporting based on administrative data Local management with centralised data Simplifying data entry Synchronising authoritative data THEMIS Asset stocktake Putting metadata to work Issues Error 401.303 Text box length exceeded. Refer to KB1237 for assistance with this error Putting metadata to work Issues • Data in THEMIS is out of date • No direct access to update THEMIS – Generates significant workload for 2 organisations • Asset data from other sources (CMDB) is out of date • Previous updates incomplete Putting metadata to work The Key(?) • Excel “wizard” that can be imported into THEMIS Putting metadata to work Useability • Where is the data I need to see? Putting metadata to work The Key • Not user friendly BUT • Consistent data structure for receiving and updating data • Create a local copy for collecting current data • Populate with “static” data from THEMIS • Compare “live” data with THEMIS • Export current data to THEMIS Putting metadata to work The Pilot • Filemaker 12 database to handle data • Accessed via Filmaker Go on iPhone • Integrate with CNS barcode app to scan barcodes • Streamline onsite data collection Putting metadata to work Simplify data display Putting metadata to work Potential Spin Offs • Re-use data for local asset management processes • Warn me X weeks before a computer is due for • • replacement How many computers are due for replacement in X months? Auto-complete asset management forms e.g. disposal “Hacking” centralised data Linking data management to process management Data visualisation Creating a contact directory from an org chart Putting metadata to work The concept • An org chart is a list of positions linked to people • A contact list is a list of people linked to contact data • The people who maintain org charts are often the same people responsible for local contact lists • What if I want a list of people sorted by where they work? Putting metadata to work The concept • DO NOT update contact details locally – Individuals must update their details in THEMIS • Create links for Positions in org chart and link reporting lines • Link positions to usernames and lookup other details • Export data for viewing – GRAPHML for Org Chart – XML, HTML or PDF for contact list Putting metadata to work Challenge • It is technically possible for THEMIS to export an • XML data source for re-use (Find an Expert) For various reasons it is not practical at this point in time • How do I collect centrally managed contact information efficiently? – Active Directory? Putting metadata to work Raw data: It’s not pretty, but it’s useable What I’ve learnt Putting metadata to work • Many people know the problems but without a technical solution nothing happens • Working smarter requires everyone to work together – Managers, works, technical people • Know when to give up • Working smarter is contagious • IT support ≠ Technical support Discussion/ Questions © Copyright The University of Melbourne 2009