Introduction - Louisiana Tech University

advertisement
Various File and Disk Usage Utilities
Submitted to
Dr. Leangsuksun
Computer Science Department
Louisiana Tech University
Ruston, LA
By
Tamor Ursin
Computer Science
11 May 2004
Ursin
2
Introduction
What are file and disk usage utilities? File utilities can be organized in three major categories.
These categories consist of file management, file information, and file compression utilities. File
management utilities help manage various file types on the Linux operating system. File information
utilities basically describe the information requested by the appropriate command. Finally, file
compression utilities compress and decompress file types.
Disk usage utilities provide four main commands which allow users to check usage for memory,
space, and file systems. Throughout this paper I will farther discuss and give examples of two major
utilities from file management, file information, and file compression, and disk usage categories.
File Utilities
File Management
Two major file management utilities are rsync and chown. Rsync is a program that behaves in the
same way that rcp does, however, there are many more options and uses with rsync remote-update
protocol to greatly increase file transfer speed when the destination file already exist. The rsync protocol
allows rsync to transfers only the differences between two sets of files across the network link using an
efficient search algorithm. Rsync’s specialty is synchronizing file trees across networks; however, it is
optimized for single computer.
Some of the major additional features of rsync consist of support for copying links, devices,
owners, groups, and permissions. Secondly, rsync is optimized to exclude and exclude-from options
similar to GNU tar. Lastly, rsync possesses a CVS exclude mode for ignoring the same files that CVS
would ignore. For example, suppose you have a directory called source
and you want to create a backup in destination this is how the process would be accomplished:
rsync –a source/ destination
Ursin
3
The second major utility is chown. The owner of the file specified by path or by fd is changed.
Only the super-user may change the owner of a file. The owner of a file may change the group of the file
to any group of which that owner is a member. The super-user may change the group arbitrarily. If the
owner or group is specified as -1, then that ID is not changed. When the owner or groups of an executable
file are changed by a non-super user, the S_ISUID and S_ISGID mode bits are cleared. POSIX does not
specify whether this also should happen when root does the chown; the Linux behaviors depend on the
kernel version. In case of a non-group executable file (with clear S_IXGRP bit) the S_ISGID bit indicates
mandatory locking, and is not cleared by a chown. A practical display of this function is to change owner
and group of a file at same time by:
Chown owner.group filename
A second example of how to change owner and group of multiple files at the same time signifies:
Chown –R owner.group directory
File Information
The strace utility calls and signals for a binary program. In the simplest case strace runs the
specified command until it exits. TI intercepts and records the system calls which are called by a process
and the signals which are received by a process. The name of each system call, its arguments and its
return value are printed on standard error or to the file specified with the –o options. Strace is a useful
diagnostic, instructional, debugging tool. System administrators, diagnosticians and trouble-shooters find
it invaluable for solving problems with programs for which the source is not readily available since they
do no t need to be recompiled in order to trace them. Students, hackers and the overly curious will find
that a great deal can be learned about a system and its system calls by tracing even ordinary programs.
And programmers find that sine system calls and signals are events that happen at the user/kernel
interface, a close exam of this boundary is very useful for bug isolation, sanity checking and attempting to
capture race conditions. In the simplest case strace runs the specified command until it exits. It intercepts
and records the system calls which are called by a process and the signals which are received by a
Ursin
4
process. The name of each system call, its arguments and its return value are printed on standard error or
to the file specified with the -o option. For example:
1.
strace [ -dffhiqrtttTvxx ] [ -acolumn ] [ -eexpr ] ... [ -ofile ] [ -ppid ] ...
[ -sstrsize ] [ -uusername ] [ -Evar=val ] ... [ -Evar ] ...
[ command [ arg ... ]]
2.
strace -c [ -eexpr ] ... [ -Ooverhead ] [ -Ssortby ] [ command [ arg ... ] ]
The second utility am introducing is head, which has a twin named tail. The head utility allows the user
to display the first 10 lines of a file to standard output. With this utility more than one FILE precedes
each with a head giving the file name. Mandartory arguments are are necessary for short and long
options. With head, there are options to print first bytes. Secondly, to print first lines instead of first 10.
Thirdly, there is the option to never print headers giving a file name. Finally, the verbose options which
always prints headers giving a file name. Examples of these functions are as follows:
1. -c, --bytes=SIZE
2. –n, --lines=NUMBERS
3. –q, --quiet, --silent
4. –v, --verbose
File Compression
There are four major types of linux file compression utilities. The two that will be infusized are
bunzip2 and gzip. Bunzip2 utilities is as simple as using any other command-line tool. There are switches
to use with the main command but typical usage will be without switches. The most important thing to
remember is that bzip2 compresses and bunzip2 decompresses. If you have a file named todays_payroll
and you need this file compressed with bzip2, run the command bzip2 todays_payroll, which will result in
the file todays_payroll.bz2. To decompress the new file, run the command bunzip2 today_payroll.bz2,
and the original file will appear intact. Unlike bunzip2, the gzip compression utilities use Lempel-Ziv
Ursin
5
coding (LZ77). This compression technique is based on numerically indexing character string segments,
based on their first appearance in a file, and then replacing those strings with numeric values in future
occurrences. The algorithm is complex, and doesn’t offer an enormous upside in file size reduction. A 14character test string, abaabaaabbabb, that I compressed using Lempel-Ziv, dropped to 13 characters,
0a0b1a2b1ab45.
I compressed a 34-MB file with bzip2 down to 11 MB; gzip compressed the file to 12 MB but took nearly
half the time. Bzip2 has to rearrange blocks in such a way as to make the overall file smaller; gzip simply
makes each string smaller by replacement. Because gzip doesn't have quite the compression ratio of bzip2,
yet is able to compress much faster, gzip is best suited for on-the-fly compression where size is not an
issue. Other than speed, gzip holds one other benefit over bzip2;gzip is able to work with multiple
formats. Where bzip2 is only able to handle files with the .bz2 extension, gzip can work with .gz, .Z, .tgz,
and .zip extensions.
The bzip2recover (part of bzip2) utility has the ability to recover data from a damaged
transmission error or damaged media. This utility should only be used on larger .bz2 files because the
larger the file, the more recoverable blocks it will contain. To attempt recovery, run the command
bzip2recover file_name. The recovered file will have a leading recov00001 (where 00001 equals the
number of the extracted block).
Both gzip and gunzip have a number of switches that can be passed to the command. The three most
useful switches are:



-N: This always saves the original file name and time stamp.
-r: This recursively compresses a directory.
-c: This concatenates two files.
The -c switch must be used with caution. The syntax of this command requires two steps:
1. Step 1: gzip -c file1 > file.gz
2. Step 2: gzip -c file2 >> file.gz
Disk Usage Utilities
Ursin
There are four main commands that associate with disk utilities. Df is for disk usage for
all mounted drives. Du is for disk usage for the current directory and all subdirectories. Du -h
show results in human-readable form (kb & MB). Finally stat displays information about the
specified file(s).With no arguments, df reports the space used and available on all currently
mounted filesystems (of all types). Otherwise, df reports on the filesystem containing each
argument file. Normally the disk space is printed in units of 1024 bytes, but this can be
overridden . Non-integer quantities are rounded up to the next higher unit.
If an argument file is a disk device file containing a mounted filesystem, df shows the space
available on that filesystem rather than on the filesystem containing the device node . GNU df
does not attempt to determine the disk usage on unmounted filesystems, because on most kinds
of systems doing so requires extremely nonportable intimate knowledge
of filesystem structures.
Du reports the disk space for the current directory. Normally the disk
space is printed in units of 1024 bytes, but this can be overridden . Non-integer quantities are
rounded up to the next higher unit.
The syntax for this option is:
du [ option ] … file
On BSD systems, du reports sizes that are half the correct values for files that are NFS-mounted
from HP-UX systems. On HP-UX systems, it reports sizes that are twice the correct values for
files that are NFS-mounted from BSD systems. This is due to a flaw in HP-UX; it also affects the
HP-UX du program.
Stat reports all information about the given files. But it also can be used to report the
information of the filesystems the given files are located on. If the files are links,
stat can also give information about the files the links point to.
6
Ursin
7
Sync writes any data buffered in memory out to disk. This can include modified
superblocks, modified inodes, and delayed reads and writes. This must be
implemented by the kernel; The sync program does nothing but exercise the sync system call.
The kernel keeps data in memory to avoid doing (relatively slow) disk reads and writes. This
improves performance, but if the computer crashes, data may be lost or the filesystem corrupted
as a result. sync ensures everything in memory is written to disk.
Conclusion
File utilities are organized in three major categories. These categories consist of file management,
file information, and file compression utilities. I have conveyed that file management utilities help
manage various file types on the Linux operating system. File information utilities basically describe the
information requested by the appropriate command. Finally, file compression utilities compress and
decompress file types.
Disk usage utilities provide four main commands which allow users to check usage for memory,
space, and file systems. Throughout this synopsis, I gave two major utilities from file management, file
information, and file compression, and four major utilities disk usage categories.
Download