DSpace Import, Export & Backup Mukesh A Pund Scientist NISCAIR Backup Vs. Export/Import Backup is meant for guarding the data from disk crash, virus attack, hacking or any calamity Export/Import is meant for exchange of digital objects across repositories Hardware Required for Backup Any one of the following CD-ROM/DVD-ROM DAT Drive External Hard disk Another system on the LAN DSpace Directory Structure /dspace/assetstore /dspace/assetstore (bitstreams – most important) /dspace/bin (commands to be used at command line, can always be generated from dspace-source files /dspace/config (you might have customized it, one time backup is good enough) /dspace/handle-server DSpace Directory Structure /dspace/lib (can always be generated from dspacesource) /dspace/logs (essential to generate statistical reports and bug tracing) /dspace/reports (can be generated from Log files) /dspace/search (can be regenerated using index-all command) Where DSpace stores data /dspace/assetstore directory will have all the Bitstreams and licenses PostgreSQL databases contains information on Communities Collections e-groups E-persons, thier passwords Host of other information What should be Backedup Your DSpace postgreSQL database /dspace/assetstore (minimum backup) /dspace (entire directory) Creating Backup Directory Create one directory where backup files will be stored Eg #mkdir /dspacebkp #chmod 777 /dspacebkp tar Command (compress) To back up /dspace directory $tar zcvf /dspacebkp/dspace150508.tar.gz /dspace To back up only /dspace/assetstore $tar zcvf /dspace/asset150508.tar.gz /dspace/assetstore Untar (uncompress) To untar and unzip the tar.gz file, you may use the following command $tar zxvf /dspacebkp/dspace150508.tar.gz WARNING:The safer approach is to use the above command in temp directory and copy it to dspace directory only after successfully untaring the file Backup of database The following commands are for Postgresql database backup Run pg_dump as dspace user Ex: $ su - /dspace /usr/local/pgsql/bin/pg_dump dspace > /dspacebkp/dspace_db_150508 Backup of database Where dspace is name of the database /dspacebkp/dspace_db_150508 file is backup file in which all the table definitions and contents will be stored Restoring the backup data One can use any of the following commands: psql command OR pg_restore Restoring the Database WARNING: You do not need to restore, unless your data got corrupted. Not to be used as a routine Of course backup should be done periodically Using psql to Restore $ psql -d dspace -f /dspacebkp/dspace_db_150508 Where dspace is the name of database dspace_db_150508 is the backup file taken on 15th May 2008. Using pg_restore to Restore pg_restore -d dspace /dspacebkp/dspace_db_150508 More options of pg_restore can be explored by $ man pg_restore Export/Import in Dspace Not to be used as a backup mechanism Export and import deal only with bitstreams, metadata, license and handles You can export or Import An item A collection or Export dsrun org.dspace.app.itemexport.ItemExport \ --type=COLLECTION \ --id=collID \ --dest=dest_dir \ --number=seq_num \ Where --type can have either the value COLLECTION or ITEM --id is the handle/collection_or_Item_Id ex: 1849/2 (or 123456789/2 in case you do not have handle) --dest is destination directory, if does note exist create --number is sequence number it can be just 1 Importing dsrun org.dspace.app.itemimport.ItemImport \ --add \ --eperson=dspace@localhost.localdomain \ --collection=collectionID \ --source=items_dir --mapfile=mapfile Where --add or --replace or --remove --mapfile can be used later to remove uploaded items What is exported The following files will be created for every item dublin_core.xml ( metadata) Handle ( one line having the handle number) license.txt Actual file ( bitstream: could be pdf or doc or an image file) Contents (with two lines – license file name, and actual bitstream name)