1 1. Installing File

advertisement
Identification with File
The “file” command (available on http://www.darwinsys.com/file/) is “a file type guesser”,
that is, a command-line tool that tells you in words what kind of data a file contains. Unlike
most GUI systems, command-line UNIX systems - with this program leading the charge –
don’t rely on filename extensions to tell you the type of a file, but look at the file’s actual
contents. This is, of course, more reliable, but requires a bit of I/O.
The File identification module relies on JNA (Java Native Access) to access File companion
dynamic library: libmagic.
1 1. Installing File
1.1 Installing File on Windows
The File tool is available for Windows both through Cygwin and as a GNU Win32 package.
As the Cygwin DLL are not valid Windows DLL, the GNU Win32 package is preferred.
This section describes installation of the GNU Win32 version of File.
1. Visit http://gnuwin32.sourceforge.net/ and download the File setup program (file-5.03setup.exe at the time of this writing). This program is used to download and install Cygwin
itself and also Cygwin applications. It does not require installation; simply save the file to a
convenient location.
2. Run the File setup program, accept the licence agreement and choose the installation
directory for File.
3. On the Windows desktop, open the local computer properties, select the Advanced tab and
click on the “Environment Variables” button.
4. In the “Environment Variables” dialog, add a new variable for the user named “Path”, the
value of which is the “bin” directory of the File installation directory. (This directory contains
the libmagic DLL: magic1.dll.
5. Go to the File tool site (http://www.darwinsys.com/file/) and download File source code
for the version found on GnuWin32 (e.g. file-5.03.tar.gz).
6. Open the source code compressed archive (tar.gz) and extract the whole Magdir (fileX.YY/magic/Magdir with X.YY the File version) directory. This directory contains the
signature definitions for all the file types that can be identified by File. You can edit or add
signature definition files to have more file types identified.
1.2 Installing File on Linux or Solaris
To install File on Linux or Solaris, you should download File source code (from
ftp://ftp.astron.com/pub/file/) and compile the source.
By default, “make install” installs the package’s commands under “/usr/local/bin”, include
files under “/usr/local/include”, etc. You can also specify an installation prefix other than
“/usr/local” by giving “configure” the option “--prefix=PREFIX”, the package uses PREFIX
as the install directory for installing programs and libraries. For example:
$ ./configure --prefix=/home/web_expl/file/5.04 && make && make install
1.3 Installing File on Mac OS X
The File tool is available natively on Mac OS X but not the libmagic dynamic library.
You can get libmagic using MacPorts (formerly known as project DarwinPorts).
1. Visit http://www.macports.org/ and download the package installer for your version of Mac
OS X (Snow Leopard, Leopard or Tiger) from http://www.macports.org/install.php, section
“Mac OS X Package (.pkg) Installer”.
2. Run the installer to install MacPorts.
3. Open a Terminal window.
4. Run the “sudo port –v selfupdate” command to get the latest revisions of the port files.
5. Run the “sudo port install file” command to install the File tool.
Note: the port for the File tool includes a variant “with_text_magic_file” to install the
signature definition files. Alas, this variant is broken for File 5.04 port. You can try to install
File with this variant using “sudo port –v install file +with_text_magic_file”. If
the installation fails, run the “sudo port –f uninstall file” command and continue with
steps 6 and 7 below.
6. Go to the File tool site (http://www.darwinsys.com/file/) and download File source code
for the version found on MacPorts (e.g. file-5.04.tar.gz).
7. Open the source code compressed archive (tar.gz) and extract the whole Magdir (fileX.YY/magic/Magdir with X.YY the File version) directory. This directory contains the
signature definitions for all the file types that can be identified by File. You can edit or add
signature definition files to have more file types identified.
2 Configuring JHOVE2 File identification module
The config directory contains files that configure many options for the JHOVE2 framework
and application. This section describes a minimal configuration procedure required to get the
File identify module working.
2.1 Activate the File identification module
The identify module configuration file is config/spring/module/identify/jhove2-identifyconfig.xml.
Edit this file to change the sourceIdentifier property to point to the File identification module
instead of DROID. Find the first <property> tag with ref=”DROIDIdentifier”, copy the line,
comment out the first tag and modify the second to use the File identification module:
ref=”LibmagicIdentifier”.
The file must be edited as follows:
<!-- Identifier module bean -->
<bean id="IdentifierModule"
class="org.jhove2.module.identify.IdentifierModule" scope="prototype">
<property name="developers">
<list value-type="org.jhove2.core.Agent">
<ref bean="CDLAgent"/>
<ref bean="PorticoAgent"/>
<ref bean="StanfordAgent"/>
</list>
</property>
<!-- property name="sourceIdentifier" ref="DROIDIdentifier"/ -->
<property name="sourceIdentifier" ref="LibmagicIdentifier"/>
<property name="shouldSkipIdentifyIfPreIdentified" value="true"/>
</bean>
2.2 Configure the File identification module
The File identify module configuration file is config/spring/module/identify/file/jhove2identify-file-config.xml.
<bean id="LibmagicIdentifier"
class="org.jhove2.module.identify.file.LibmagicIdentifier"
scope="singleton" init-method="init" destroy-method="shutdown">
<property name="developers">
<list value-type="org.jhove2.core.Agent">
<ref bean="BnFAgent"/>
</list>
</property>
<!-- Point the magicFileDir property to a directory containing
Magic definition files to force the compilation of these files
when the LibmagicIdentifier starts (once per JVM run). If this
property is not set, the system-provided (on UNIX and Linux
systems) definitions will be used.
-->
<!-- property name="magicFileDir" value="classpath:file/Magdir"/ -->
</bean>
Edit this file to uncomment the magicFileDir property and make it to point to the directory
containing the signature definition files. The value of the property can be either a file system
path (e.g. /usr/local/share/misc/magic) or a directory relative to the Java VM classpath;
in the latter case, the path shall be prefixed with “classpath:”.
For example, if the Magdir directory of the File source distribution was copied into a
subdirectory named “file” in the JHOVE2 main configuration directory (“config”) which is
added to the Java VM classpath by the jhove2 (.bat/.cmd/sh) command, the value of the
magicFileDir property shall be “classpath:file/Magdir”.
2.3 Notes for Linux
On Linux you can also chose to use the existing File command which is most likely installed
on your system. Just leave the magicFileDir property commented in jhove2-identify-fileconfig.xml and the wrapper will use the installed magic files.
Download