Identification with File The “file” command (available on http://www.darwinsys.com/file/) is “a file type guesser”, that is, a command-line tool that tells you in words what kind of data a file contains. Unlike most GUI systems, command-line UNIX systems - with this program leading the charge – don’t rely on filename extensions to tell you the type of a file, but look at the file’s actual contents. This is, of course, more reliable, but requires a bit of I/O. The File identification module relies on JNA (Java Native Access) to access File companion dynamic library: libmagic. 1 1. Installing File 1.1 Installing File on Windows The File tool is available for Windows both through Cygwin and as a GNU Win32 package. As the Cygwin DLL are not valid Windows DLL, the GNU Win32 package is preferred. This section describes installation of the GNU Win32 version of File. 1. Visit http://gnuwin32.sourceforge.net/ and download the File setup program (file-5.03setup.exe at the time of this writing). This program is used to download and install Cygwin itself and also Cygwin applications. It does not require installation; simply save the file to a convenient location. 2. Run the File setup program, accept the licence agreement and choose the installation directory for File. 3. On the Windows desktop, open the local computer properties, select the Advanced tab and click on the “Environment Variables” button. 4. In the “Environment Variables” dialog, add a new variable for the user named “Path”, the value of which is the “bin” directory of the File installation directory. (This directory contains the libmagic DLL: magic1.dll. 5. Go to the File tool site (http://www.darwinsys.com/file/) and download File source code for the version found on GnuWin32 (e.g. file-5.03.tar.gz). 6. Open the source code compressed archive (tar.gz) and extract the whole Magdir (fileX.YY/magic/Magdir with X.YY the File version) directory. This directory contains the signature definitions for all the file types that can be identified by File. You can edit or add signature definition files to have more file types identified. 1.2 Installing File on Linux or Solaris To install File on Linux or Solaris, you should download File source code (from ftp://ftp.astron.com/pub/file/) and compile the source. By default, “make install” installs the package’s commands under “/usr/local/bin”, include files under “/usr/local/include”, etc. You can also specify an installation prefix other than “/usr/local” by giving “configure” the option “--prefix=PREFIX”, the package uses PREFIX as the install directory for installing programs and libraries. For example: $ ./configure --prefix=/home/web_expl/file/5.04 && make && make install 1.3 Installing File on Mac OS X The File tool is available natively on Mac OS X but not the libmagic dynamic library. You can get libmagic using MacPorts (formerly known as project DarwinPorts). 1. Visit http://www.macports.org/ and download the package installer for your version of Mac OS X (Snow Leopard, Leopard or Tiger) from http://www.macports.org/install.php, section “Mac OS X Package (.pkg) Installer”. 2. Run the installer to install MacPorts. 3. Open a Terminal window. 4. Run the “sudo port –v selfupdate” command to get the latest revisions of the port files. 5. Run the “sudo port install file” command to install the File tool. Note: the port for the File tool includes a variant “with_text_magic_file” to install the signature definition files. Alas, this variant is broken for File 5.04 port. You can try to install File with this variant using “sudo port –v install file +with_text_magic_file”. If the installation fails, run the “sudo port –f uninstall file” command and continue with steps 6 and 7 below. 6. Go to the File tool site (http://www.darwinsys.com/file/) and download File source code for the version found on MacPorts (e.g. file-5.04.tar.gz). 7. Open the source code compressed archive (tar.gz) and extract the whole Magdir (fileX.YY/magic/Magdir with X.YY the File version) directory. This directory contains the signature definitions for all the file types that can be identified by File. You can edit or add signature definition files to have more file types identified. 2 Configuring JHOVE2 File identification module The config directory contains files that configure many options for the JHOVE2 framework and application. This section describes a minimal configuration procedure required to get the File identify module working. 2.1 Activate the File identification module The identify module configuration file is config/spring/module/identify/jhove2-identifyconfig.xml. Edit this file to change the sourceIdentifier property to point to the File identification module instead of DROID. Find the first <property> tag with ref=”DROIDIdentifier”, copy the line, comment out the first tag and modify the second to use the File identification module: ref=”LibmagicIdentifier”. The file must be edited as follows: <!-- Identifier module bean --> <bean id="IdentifierModule" class="org.jhove2.module.identify.IdentifierModule" scope="prototype"> <property name="developers"> <list value-type="org.jhove2.core.Agent"> <ref bean="CDLAgent"/> <ref bean="PorticoAgent"/> <ref bean="StanfordAgent"/> </list> </property> <!-- property name="sourceIdentifier" ref="DROIDIdentifier"/ --> <property name="sourceIdentifier" ref="LibmagicIdentifier"/> <property name="shouldSkipIdentifyIfPreIdentified" value="true"/> </bean> 2.2 Configure the File identification module The File identify module configuration file is config/spring/module/identify/file/jhove2identify-file-config.xml. <bean id="LibmagicIdentifier" class="org.jhove2.module.identify.file.LibmagicIdentifier" scope="singleton" init-method="init" destroy-method="shutdown"> <property name="developers"> <list value-type="org.jhove2.core.Agent"> <ref bean="BnFAgent"/> </list> </property> <!-- Point the magicFileDir property to a directory containing Magic definition files to force the compilation of these files when the LibmagicIdentifier starts (once per JVM run). If this property is not set, the system-provided (on UNIX and Linux systems) definitions will be used. --> <!-- property name="magicFileDir" value="classpath:file/Magdir"/ --> </bean> Edit this file to uncomment the magicFileDir property and make it to point to the directory containing the signature definition files. The value of the property can be either a file system path (e.g. /usr/local/share/misc/magic) or a directory relative to the Java VM classpath; in the latter case, the path shall be prefixed with “classpath:”. For example, if the Magdir directory of the File source distribution was copied into a subdirectory named “file” in the JHOVE2 main configuration directory (“config”) which is added to the Java VM classpath by the jhove2 (.bat/.cmd/sh) command, the value of the magicFileDir property shall be “classpath:file/Magdir”. 2.3 Notes for Linux On Linux you can also chose to use the existing File command which is most likely installed on your system. Just leave the magicFileDir property commented in jhove2-identify-fileconfig.xml and the wrapper will use the installed magic files.