Word Count 1.00 = ===================== Features

advertisement
=====================
= Word Count 1.00 =
=====================
Features
========
- Count of lines, characters, non-whitespace characters, words, distinct
words and unique words.
- Average length of words, distinct words and unique words.
- Sorted word lists with frequencies.
- Word length distribution histograms.
- Code page awareness.
- "Quick scan" mode.
- Multiple filespecs/wildcards.
- WordStar 6.0 document format support.
Files
=====
WC.EXE
WC.DOC
WC.CRO
Word Count executable file
Word Count documentation (this file)
Addendum to the documentation for Croatian users
System requirements
===================
PC XT 8086/8088 or compatible
128 kb of free conventional RAM
Hard/floppy disk
MS-DOS/PC-DOS 3.30 or later
Conventions
===========
Word
A sequence of characters contained in either primary
or secondary word set.
Primary word set is considered to be [a-zA-Z€-š -¥]
if the /cp option is not specified, or [a-zA-Z] plus
whichever national character set you use if it is.
(Sets are given in regular expression syntax used by
many UNIX or UNIX-like utilities, most notably GREP.)
Secondary word set is considered to be [0-9_']. For
instance,
wouldn't
is_ascii
Lotus123
are all eight-character words. Since each word has to
contain at least one character from the primary set,
1995
'95
_1995_
are NOT words.
Distinct word
Different word. For instance, sentence
Up, up, and away!
contains three distinct words: "UP", "AND", and
"AWAY". Comparison of words is case insensitive by
default.
Unique word
Word that appears exactly once in the whole text.
Sentence
Up, up, and away!
contains two unique words: "AND" and "AWAY".
Absolute freq.
Count of appearances of a single word in a text.
Relative freq.
Absolute frequency divided by the total count of
words in the text.
Character
ASCII character in range 32-255, or TAB (ASCII 9).
Non-whitespace
ASCII character in range 33-255.
Line
Sequence of characters terminated by EOL (ASCII
13/10) or by the end of file. Number of lines in the
text equals the count of EOLs plus one.
Usage
=====
WC files [options]
/cp
/h
/hd
/l[f|s|l|u][@|@@<file>]
/q
/s[s]
/ws
Code page support
Show histogram
Show histogram for distinct words
List used words
[w/Freq|Sorted by freq|by Length|Unique]
[write|append to specified file]
Quick scan
Case Sensitive scan [& Sort]
WordStar 6.0 document
Multiple filespecs and wildcards are allowed.
Multiple files are processed as a single large file.
Order of options/filenames is not important.
Options are case-insensitive.
Available options
=================
/cp
Enables the use of current code page settings, thus
providing the support for national alphabets. All
characters will be upcased according to uppercase
table, and word lists will be sorted according to
collating sequence table, as provided by DOS.
>>>> Note that if text was written using one code
page setting and Word Counted with another, its
statistics are likely to be misleading. However,
7-bit texts (that is, those containing ASCII
characters in range 0-127) are not affected by that.
/h
Shows histogram. Classifies all used words by their
lengths, and prints corresponding absolute and
relative frequencies.
/hd
Same as above, but shows histogram for distinct
words.
/l
Prints a sorted plain list of all used words.
/lf
Prints a sorted list of all used words with their
corresponding frequencies.
/ll
Prints a list of all used words, sorted by ascending
word lengths.
/ls
Prints a list of all used words with their
corresponding frequencies, sorted by descending
frequencies.
/lu
Prints a sorted plain list of unique words.
/q
Performs a quick scan. Roughly 30 percent faster than
the default, with minimal memory requirements, but
distinct/unique word stats, distinct words histogram
and list options are not available.
/s
Case sensitive scan and case insensitive sort. Words
are considered distinct even if they differ in
capitalization only.
/ss
Case sensitive scan and sort.
/ws
Scans text in WordStar 6.0 (and hopefully 7.0)
document format.
>>>> Documents created with WordStar version 4.0 or
earlier will *not* be processed correctly, as their
format differs from that of WS 6.0. Same might apply
to WS versions 5.0 and 5.5; I didn't have an
opportunity to check that.
Redirection
All generated lists can be redirected to a file. For
example, specifying
/lu@analysis
will write a list of all unique words to file named
"analysis". If it already exists, it will be
overwritten. Specifying
/lu@@analysis
will do the same, except that the output will be
appended to the end of the file. Note that no blanks
are allowed neither between @/@@ and the output file
name, nor between @/@@ and the switch character.
Option:
Is incompatible with:
List option
List option
/q
/s
/cp
Any other list option
/q
/hd
/ss
/ss
Limitations
===========
Word length
Maximum word length is 64 characters. Longer words
are truncated.
Frequencies
Maximum word frequency is 65535.
Filenames
Number of filespecs in the command line is not
limited, but the program can process a maximum of 128
files. Remaining files are ignored.
Memory
Each distinct word takes up (9+length) bytes of
memory. The program itself occupies around 50 kb.
Therefore, PCs with 620 kb of free conventional
memory will probably not be able to process texts
containing more than 35000 distinct words. With the
/q option, however, there are no memory limitations.
Error messages
==============
Insufficient memory/Out of memory
There is no memory left to store (additional)
distinct words. Try to make more conventional RAM
available. If nothing works, specify the /q option.
Can't open
The specified file is not there, or its name is
invalid. Probably caused by misspelled filename or
path. Another possible reason is that you have typed
a blank between @/@@ and the output file name.
Can't create
Output file name is invalid, or disk is full, or file
already exists and is marked read-only, or floppy
disk is not ready.
Error reading
File exists and is opened successfully, but a problem
has occurred during read operation. This is probably
due to media malfunction.
Conflicting options
Command line contains two or more options which
perform conflicting actions (see "Available
options").
Invalid option
Command line contains an invalid option. Check the
syntax.
In the event of error, Word Count terminates with the DOS error level
set to 1.
License
=======
You may freely use, copy and distribute Word Count as long as:
1) It is distributed in its original, unmodified form, including the
documentation.
2) It is not sold for profit. A reasonable handling expenses fee is
permissible.
You are encouraged to upload Word Count to your local BBS or ftp server.
If you use this program and find it of value, a contribution of $7
(business/commercial users: $15) or any amount will be appreciated.
Contact address
===============
All questions, comments or suggestions concerning Word Count and its use
are more than welcome. Send your mail to the author:
Branko Radovanovic
Josipa Seissela 44
10010 Zagreb
CROATIA
e-mail:
br31187@pinus.cc.etf.hr
or
br31187@pinus.cc.fer.hr
Download