Ingest and Loading DigiTool Version 3.0 Ingest Agenda Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management Ingest and Loading 2 DigiTool Modules Deposit Single & Bulk Search & Index Dispatcher & Viewers Approval Web Services Ingest and Loading 3 Ingest Module Two main functions: Creation and submission of new ingest activities – bulk and individual Monitoring of ingest status (scheduled, running success etc.) Ingest activities can be initiated directly from the Ingest application, by pre-defined templates for manual/automatic ingest started in the Deposit Module or potentially by FTP feed Ingest and Loading 4 Ingest Main Functions Ingest and Loading 5 Ingest Architecture One loader, multi transformers Transformer – takes objects and/or metadata as input, and transforms it to the Repository digital entity representation. Ingest activity is a workflow that combines a certain transforming process and potential background tasks, and is followed by the generic loader. All loads (including batch) are processed as individual digital entities to control loading errors. Ingest and Loading 6 Example: Template-Based Transformer Output Input Digital Entity Transformer and pre ingest tasks Digital Entity Digital Entity Digital Entity Digital Entity Ingest and Loading Digital Entity Load to Repository 7 Common Workflows Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management Ingest and Loading 8 Ingest Activity A typical workflow for submission of Ingest activity: 1. Enter activity name and schedule time for running, select type of transformer and determine the background tasks to run as part of the ingest activity. 2. Order/Select background tasks into a task chain 3. Select Digital Entity template and select/verify background task parameters 4. Point to location of files or upload files 5. Submit Ingest and Loading 9 Common Workflows Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management Ingest and Loading 10 Step 1 – Ingest Types - Transformers 1. File stream(s) that will be loaded with no relationships 2. File stream(s) that will become part of one parent record 3. File stream(s) utilizing the DigiTool file name convention 4. MARC XML file and associated file stream(s) 5. Dublin Core XML file and associated file stream(s) 6. Comma separated value (.csv) file 7. METS xml file and associated file stream(s) 8. Exported DigiTool repository elements for ingest/re-ingest Ingest and Loading 11 Step 1 – Ingest Types - Transformers 1. File stream(s) that will be loaded with no relationships Treats every file uploaded as a separate entity with no relationships. Each formed record will be separate upon ingest to the repository. 2. File stream(s) that will become part of one parent record Used to create relationships among the file(s) ingested. An additional "parent" record will be added that allows navigation between the file(s) loaded. Ultimately, each file will attain its own individual record, but with the option of viewing the "parent" record which points to all of the stream(s) loaded. 3. File stream(s) utilizing the DigiTool file name convention Takes file stream(s) with filenames according to the DigiTool standard and based on these filenames, automatically creates a hierarchical METS file for load into the repository. Ingest and Loading 12 Step 1 – Ingest Types - Transformers 4. MARC XML file and associated file stream(s) Takes a standard MARCXML file as input and loads each metadata record as a separate entity. The MARCXML file may contain links to file stream(s) – local or remote - through the use of metadata tag placeholders which would associate each file stream(s) with its MARC record. 5. Dublin Core XML file and associated file stream(s) Takes a standard DCXML file as input and loads each metadata record as a separate entity. The DCXML file may contain links to file stream(s) – local or remote - through the use of metadata tag placeholders which would associate each file stream(s) with its DC record. 6. Comma separated value (.csv) file Takes a standard .csv file along with appropriate mapping information and loads each row as a separate record. File stream(s) may also be uploaded as part of this transformer’s workflow. Ingest and Loading 13 Step 1 – Ingest Types - Transformers 7. METS xml file and associated file stream(s) Takes a METS XML file as input and a decomposition into single atom units ensues for proper ingest. The XML file may contain links to file stream(s) local or remote and will be stored in the repository with all structural relationships defined such that a recomposition takes place upon delivery of this compound object. 8. Exported DigiTool repository elements for ingest/re-ingest Takes digital entities that are already in the repository-recognized format and allows their ingest/re-ingest back into the repository. Ingest and Loading 14 Step 1 – Ingest Schedule and Assignment Scheduling ingest assignment is a required portion of any ingest activity. Options include: - As soon as possible - Specified time and date With the appropriate privileges, the assignment to other Staff users of the same Admin Unit can be set. The default is for the assignment to the logged-in staff user. Please note: The “assigned to” staff user for any ingest activity is the only one who can activate that activity. Ingest and Loading 15 Common Workflows Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management Ingest and Loading 16 Step 1 – What is a task? A task is an action to be performed on the “transformed” digital entities and/or file stream(s) before ultimately ingesting the entire set of formed entities into the repository. Ingest and Loading 17 Step 1 – Task Chain Initiation Template based – Server-side templates representing a variety of pre-defined task chain combinations. New task chain – Allows a tailor-built task chain to be defined and ordered in Step 2 of the ingest activity. User-defined task chain – User-saved and defined task chain saved from a previous session. Any task chain can be saved as a user-defined task chain. Ingest and Loading 18 Available Task Chains 1. Empty Chain 2. Technical Metadata Extraction 3. Add Metadata 4. Control Section Attribute Assignment 5. Full Text Extraction 6. PDF Full Text Extraction 7. Add History Event 8. Tiff to JP2000 Converter 9. Remote Stream Download 10.Thumbnail Creation Ingest and Loading 19 Available Task Chains Empty Chain - No task chain will be applied. Technical Metadata Extraction - For recognized file stream(s), technical metadata will be extracted and mapped into standard technical metadata. Add Metadata - Allows the linking or copying of a single metadata record which will be applied to all file stream(s) part of the ingest activity. Control Section Attribute Assignment - Allows digital entity information to be defined on a one-by-one basis that will be applied to all digital entities part of the ingest activity. Full Text Extraction - For recognized file stream(s), full text will be extracted as the source object’s manifestation. Ingest and Loading 20 Available Task Chains PDF Full Text Extraction - For pdf file stream(s), full text will be extracted as the source object’s manifestation. Add History Event - Allows additional entries of change history metadata to be added to the file stream(s) of an ingest activity. Tiff to JP2000 Converter - Takes tiff image(s) and creates a JPEG2000 manifestation of the source image. Remote Stream Download - Defines the storage of URL stream(s) – either copied to local or remaining remote. Thumbnail Creation - For recognized file stream(s), a thumbnail image will be created as the source object’s manifestation. Ingest and Loading 21 Step 2 – Task Chain Definition and Order Allows staff user to pick and order the available tasks for the ingest activity. Order of tasks is relevant for certain chains: e.g. Thumbnail and Full Text before Technical Metadata extractor Ingest and Loading 22 Step 3 – Template and Task Chain Parameters Choose Digital Entity template: e.g. marc_simple_entity_with_stream.xml when using the MARC transformer and wishing to load file stream(s) with the MARC records. NOTE: Digital Entity templates are sensitive to the Transformer chosen in Step 1. Set task parameters: e.g. thumbnail height, width text language encoding for full text indexing MD insertion etc…. Ingest and Loading 23 METS transformer - METS to D.E. METS transformer Digital Entity Mets Header Control Section METS FILE dmd & amd Sections (DL content if necessary) Descriptive/ technical/rights/ Structural Map Behavior/Struct Link MD Section File Section (URL editing) For each file in File Sec Preservation File structure MD Linking Digital Entity Ingest and Loading 24 Common Workflows Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management Ingest and Loading 25 Step 4 – Local Files Choose files for upload – Active-X plugin required: Easy to use Preview of icon/thumbnail during upload Send to server Preview/Manage files Ingest and Loading 26 Step 4 – Remote Files (URL) Choose files for upload/linkage URL can be entered 1 by 1 or batch from text list * Download now (Store URL file locally) (Link to Remote location) Preview/Manage files Ingest and Loading 27 Ingest Activity – File upload Ingest and Loading 28 Common Workflows Ingest Overview and Introduction Ingest activity steps Transformers Task Chains Upload of Files Ingest Management Ingest and Loading 29 Ingest folders Not scheduled – Ingest activities ready for activation that are not scheduled. Scheduled – Ingest activities set for ingest at a specified time and date. Running – Ingest activities that are actively running. Success – Ingest activities that have loaded successfully. Failed – Ingest activities that have not loaded successfully. Ingest and Loading 30 Ingest Management Edit, Delete and Activation Monitoring log files – Task list – Shows all background tasks performed Task log – Full step by step log file for each ingest step. Task summary – Overview of major steps of the ingest process – e.g. Pre-transformer, Transformer, Ingest. Ingest and Loading 31 Additional Functions Begin with upload of files before defining tasks/definitions for ingest activity (for mass file upload). Pre-transformer – Transforms file stream(s) and/or metadata to the ingest-ready format so that a transformer can be initiated. Currently, METS Zip input from deposit is the only pre-transformer. Saving task chains to personal user profile for future use. Ingest and Loading 32 Thank you! www.exlibrisgroup.com Ingest and Loading 33