CALIFORNIA STATE UNIVERSITY, NORTHRIDGE AN

advertisement
CALIFORNIA STATE UNIVERSITY, NORTHRIDGE
AN
EXPERI~illNTAL
CURRICULUM FOR TEXT PROCESSING
A project submitted in partial satisfaction of the
requirements for the degree of Master of Science in
Computer Science
by
G1 en Wi 1 1 i ams
~
June, 1979
The Project of Glen Williams is approved:
P r o f e s s o r lVIo t i l
Professor Walsh
Professor Abbott
California State University, Northridge
i i
CONTENTS
Contents .............. .
Abstract ......••
The Objectives . . . . . . . . . . . . . . . . .
The Realities ..
TECO • . . . • . . . . . •
iii
.iv
. .... 1
.13
.14
• • • • • • • • 21
• • 24
.32
The Editor Assignment. ..........•..
An Aside On PASCAL.
Pattern Matching ..
Formatting . . . . . . . . . .
• ••• 3 7
. ... 4 2
Typesetting . . . . . . . . .
. ...•. 4 7
First Generation Typesetters ...... .
.52
TTS . . . . . . . . . . . . . . . . . . . . . · · · · · · · · · · · · ·
The Advent of Photocompostion..........
.54
Second Generation Principles.........
.......
.57
Third Generation Typesetters.
.....
. . . . . . . . . 61
Compos i t i on • . . . • . . . . . .
. •••.
. . • •. . . . . . •.
•66
Hyphenation.........................
. 70
Input To A Typesetter.......
........
......
.74
Pagination.........
.76
Word Process in~....
. .... ...... .... ........
. .80
Office Automation.
....•..
.............
..85
Con c 1 us i on . . .
... ..
..........
. . 91
Bibliography.
......
. .
.........
. .96
Appendix A (III 1070 Editor User Guide).
.97
Appendix B (A Sample "Read" Program) . . . . . . . . . . . . . . . . . . . . 100
Appendix C (CDC Character Set) . . . . • . . . . . . . . . . . . . . . . . . . . . 103
Appendix D (A Set Manipulation Program)..
.....
. .. 105
Appendix E (Editor Specification).............
. ... 107
Appendix F ("FINDL" Pattern-Matching Program)
... 117
Appendix G ("FINDA" Pattern-Matching Program)
.. 121
Appendix H (Flow of RUNOFF Justification).....
. .. 125
Appendix I (Text Processing Midterm and Final)
... 127
i i i
ABSTRACT
AN EXPERIMENTAL CURRICULUlVI FOR TEXT PROCESSING
by
G 1 en Wi 11 i ams
Master of Science in Computer Science
This
project
details
the
definitions
and
implementation
of
an experimental curriculum for an upper
division class
in
Text
Science
Department
at
Processing
within
2)
to
experiment
Computer
CSUN in order 1) to determine what
information should be presented about this
and
the
with
a
emerging
field
method of presenting that
information.
The class was offered in the Spring of 1979 under
the
direction of the objectives defined in the first section of
this paper.
arose
In teaching the
concerning
class,
certain
realizations
the
implementation details specified to
meet these objectives.
As often as not, these details were
not predicted.
Therefore,
divisions:
this
paper
consists
of
two
conceptual
a statement of the a priori objectives;
description
of
interspersed
with
the
material
remarks
iv
actually
and a
covered
on the class's reaction to the
material.
Topics covered include the
processing
use
of
TECO
as
a
text
language, the implementation considerations for
a text editor
program,
considerations
for
a
pattern
matching,
implementation
formatting program, the history and
development of typesetting and phototypesetting devices and
composition
software,
word processing, electronic message
systems and the automated office.
text
editor
The implementation of
a
was assigned as a project, which necessitated
some discussion of the PASCAL programming language.
v
The Objectives
In considering the nature of
aspects
may
somewhat
come
to
loosely
d i s c i p 1 in e s
mind.
defined
"text
Text
in
processing"
processing
relation
to
computers
information).
more
is as yet
the
other
that together camp r i s e "Camp u t e r S c i en c e" •
has been with us in one form or another
first
many
(in
the
almost
since
It
the
form of holerith cards carrying
Yet it existed almost 'by the leave' of
the
urgent tasks at hand -- the loading of data (programs
included) into the
computer.
involves
its
many
of
Even
now
text
sister disciplines.
processing
For instance,
pattern recognition is involved in the process of capturing
text
during
OCR
while
data compaction can be invoked in
storing characters in a DBMS (as
course,
text
processing
may
with
processors
and
And,
of
be said to 'take over' such
heretofor non-computer disciplines
word
ADABAS).
as
editors)
typewriting
and
(e.g.,
typesetting
(phototypesetters).
The
faster
text
than
processing
most
other
field
is
currently
expanding
areas of computer science in the
form of the automated office (wherein text processing plays
a
central
role),
electronic
message
systems,
and
document/data processing interfaces.
The automated office and the Electronic Message System
( ElVIS )
are
c 1o s e 1 y r e 1at e d , as i t i s t h e EMS t h a t p r o v i de s
the capability for sharing and storing data from heretofore
isolated
word
processors.
The
1
EMS
also
enables
the
2
travelling executive to maintain
intimate
ties
organization at any time of day.
His time is saved in that
recipients of his communications need not be
with
his
"present 11
to
be reached.
Another new linking of disciplines is introduced
the
relation
of
document
Document processors are
link
to
form:
for
data
data
processing to data processing.
adding
processing
the documet
11
with
capabilities
applications
to
directly
in an interesting
hides 11 binary information in text files
processing
up-to-date reports.
programs
to
utilize
in
providing
For instance, the latest figures in
a
company's financial report may exist in a text file that is
accessed by
data-processing
programs
to
make
financial
projections.
Therefore the objectives of a
seek
to
identify
the
reach
of
text
processing
text
processing.
introducing the field, a great deal of emphasis
on
class
is
In
placed
knowing the problems inherent in a well engineered text
editor.
This
appreciation
knowledge
for
the
for its manipulation.
equips
the
student
with
an
nature of 'text' as well as methods
The rest of this
section
discusses
the objectives in more detail.
TECO
This course therefore introduces the students to TECO,
a
powerful
character-oriented text editor and progrrunming
language sold by Digital Equipment Corporation.
TECO
may
4
smallest addressable unit of text.
an
introduction
specify
and
discussion
a
an editor.
SOS is a
where
a
implement
particular,
SOS.
to
SOS
mentioned
as
of the various ways to
text-processing
program:
in
Augmenting this is a discussion of
progrrun
stands
They are
written
for
"SON
at
Stanford
OF STOPGAP".
University,
It is so nruned
because it is meant to bridge the gap between line-oriented
editors (like LINED at DEC) and TECO.
Stopgap is primarily
line-oriented in that one may address lines as the smallest
addressable
unit of text.
However, it differs from simple
line-oriented editors in its 'Alter' command.
allows
which
the
he
user
may
to
move
character-cursor
enter
an intra-line editing mode in
through
moving
This command
the
commands,
current
and
line
with
of course, he may
modify individual character strings within the line without
retyping the whole line.
The Editor Project
The purpose of placing the editor
beginning
of
the
course
a
The class is required to
functional specification of an editor:
specifying the editor he will
trauma,
discussions
for
at
the
implement.
of
how
to
implement
To
ease
their
following few sessions will
each
feature
internal routines may be defined complimentarily
addressed.
hand
each student
center on what a reasonable editor should incorporate.
subject
the
is to allow the assignment of a
simple editor as a project.
in
discussion
The
and how the
are
also
5
To be specific, some considerations are:
1.
The language the editor is to be written in
2.
What type of device will it
smart terminal?)
3.
Line editor or character editor?
The choice
affects the cursor-positioning commands at the
disposal of the us.er
4.
How will user get a display of his text?
5.
How is the cursor represented?
6.
Will the cursor position follow typeout (as
with XEDIT, a line-oriented editor for CDC
machines)?
7.
How is the cursor moved?
8.
What form
commands?
9•
How many commands wi 11 the user be able to
enter at one time if commands are entered in
line-mode?
the
program
gets
i.e. '
characters from the terminal only after a
'break' character is entered.
10.
What form should insert take and how should it
work? That is, should text be inserted before
or after the cursor? What bearing does this
have with entering text before the first line
or after the last line?
11.
Should the delete command delete characters or
lines·or both?
12.
How do we divide a file into pages--or
we?
13.
What form should the editing buffer and/or the
Q-registers take?
14.
How is text entered into a Q-register (a
Q-register is an unbounded temporary storage
area for saving text)
15.
How is text retrieved from
how is this specified?
16.
Searching
should
it
search
around
end-of-lines?
should it search backwards?
what happens to the cursor in the event of
either failure or success? Should there be
class-matches for pattern elements?
wi 11
commands
run
on
(TTY
or
In what increments?
take?
a
One-letter
should
Q-register
and
6
17.
Help command. How will provision be made for
allowing a user to get a refresher of which
commands are available, etc.?
18.
Meta characters: How does the user specify
characters that are to be in the output file
if the character is not in the keyboard's
character
set?
Meta characters are one
possibility.
For instance, SOS uses
the
question-mark as a meta character.
To
insure
the
milestone-style
completion
assignments
are
of
this
editor,
made
throughout
the
semester.
Pattern Matching
A subject of great importance within
is
pattern
matching.
Concerns
within
text
processing
pattern matching
theory include minimizing the number of operations required
for an average search and matching classes of patterns that
can be defined
simple
as
expressions.
In
introducing
the procedure FIND
described by Horowitz and Sahni 1 is examined.
that
is
This procedure runs in time O(mn) where m is the length
of
the
pattern
regular
pattern
and
matching,
n
the
length
algorithm is obviously quadratic.
can
easily
Sahni 2 ,
of
the string.
From this
Such an
procedure
we
move to an improved FIND, also in Horowitz and
--------iElll;-Horrowitz and Sartaj Sahni, Fundamentals of
Data Structures (Woodland Hills, californTa:---computer
ScTence-Press~-Tnc., 1976), p.
191.
2 Ibid., p. 192.
7
that introduces the short-cut of
last
characters
length
is
terminating
greater
characters in string.
also
running
the
first
and
of the pattern with their counterparts in
the string and also of
pattern
checking
in
than
This
time
the
when
the
the number of remaining
algorithm,
O(mn),
search
while ·potentially
actually
exhibits
better
performance in actual useage.
From concerns of speed it is possible to move on to a
discussion
of
"A
Fast Searching Algorithm" 3 .
This
algorithm introduces more preprocessing of the pattern as a
means
of reducing the number of characters examined in the
course of a search.
that
never
we
progressing
cycling
It employs a
move
through
through
technique·
backwards
in
string.
The
the
the
for
ensuring
string
actual
while
loop
for
the string is also shortened by employing
two arrays that provide the information of how far
we
can
"slide" in the string as a function of either the character
at
the
cursor
subpatterns
or
within
as
the
a
function
pattern.
of
reoccurrances
of
This technique has been
found to be sublinear in the number of comparisons required
to find a successful match.
"A fast
of the
ocTober-;-
8
attern-class
search
as
matching
found
"wild-card"
techniques
in
In
SNOBOL.
matches
as
to
employed
speed such types of
discussing
in TECO, the subject of
very general pattern searches may be broached.
a
pattern
of
up
to
36
characters.
position of the pattern we may
class
of
so-called
specify
characters as a match.
In
any
TECO allows
any character
character
For instance a control-X
character entered as a character for a given position
cause
TECO
to
accept
~gy
character as a match.
will
One may
also specify to accept any element of an enumerated set
characters.
or
of
By prefacing a specification with a control-N
character, we search for any match except that specified by
the following specification.
patterns is
or
rejected
t~at
as
a character in the string can be
a
than
7
machine
The
instructions.
reason, of course, is due to the first pass TECO makes
to encode the pattern.
the
accepted
match in one machine instruction.
search loop itself is less
The
The beauty of TECO's wildcard
student
Thus it should
become
obvious
to
that a small amount of preprocessing can save
one or two orders of
magnitude
on
operations,
depending
upon the length and complexity of the pattern.
The obvious
drawback is that we cannot make searches
contain
that
an
elipsis.
A very general pattern search on the order
of
SNOBOL
is coded and described by Kernighan and Plauger 4 .
--------i~;i;~-W. Kernighan and P.J.
Plauger, Software
Tools, (Reading, Mass.: Addison-Wesley Publishing Company~
TY7oT, pp.
135-161.
9
Their
pattern
expressions
pattern
matching
except
which
of
the
considerations.
regular
all
covers
patterns of the form x(albc)y, the
matches
preprocesses
by
algorithm
either
xay
or
but
pattern,
xbcy.
It
also
for
any
speed
not
It is mainly to make the classes
the actual matching logic.
readable
Their treatment is valuable
to the student in that it clearly delineates all the
involved
in
implementing a sophisticated (in the sense of
handling closure) pattern matcher.
treatement
as well
It gives
an
extensive
to both the building the pattern representation
as
possible
to
the
leads
utility
for
a
of
recursion
closure
in
match.
following
Following
treatment of matching, the idea of searching and
in
the
their
same
work
student:
steps
operation
is
perhaps
all
their
is detailed.
its
block-structured language:
Finally,
a
to
written
the
in
avid
a pseudo
RATFOR.
discussion
characteristics
are
replacing
The final beauty of
accessability
routines
the
on
blending
speed
the
of the Boyer and Moore sublinear searching
algorithm with the generality of either TECO's or Kernighan
and
Plauger's
probes
pattern matching.
most
of
the
important aspects of
Such a blend is indeed possible
but
to
my knowledge, not yet realized in a commercial product.
Text Formatting
Kernighan and Plauger also give a
detailed,
program. 5
discussion
and
example
of
fine,
a
if
not
too
text formatting
10
Incorporating material
in
that
from~£!!~~~~!££!~
is
useful
it allows discussion of fine details by examining
the example program developed therein.
Two prime functions of a text formatter
to
do
are
covered:
"page layout" and horizontal justification.
these discussions appropriate corrmands to
allow
Within
the
user
control over these functions arise, as well as the criteria
for their implementation.
However, their logic for justification
in
that
it
does
evenly as does
RUNOFF
not
wanting
seem to distribute extra spaces as
RUNOFF.
employs
seems
Therefore,
the
is examined in detail.
exact
algorithm
Examples of RUNOFF
input files are also given in order to make
the
sense
above
is
of
certain commands more concrete.
The reason page layout was in
neither
RUNOFF
quotes
that
nor the Kernighan and Plauger program deal
with page layout as thought of by printers.
Page layout to
RUNOFF is managing to get headers and footnotes as well
as the text on the page in a manner
Page
layout
as
envisioned
complicated
and
is
only
corrmercial
phototypsetting
by
to
pl~asing
printers
recently
being
concerns.
is
a
page
One
does
much
more
by
illustrative
a
photograph
and asking a typesetting program to "flow" the
text around it, say in a double-column format.
RUNOFF
eye.
attempted
problem is that of specifying a position for
on
the
not
But
while
attempt such a task, a discussion of its
pp.
219-250.
11
concerns prepares the student to
deal
with
many
of
the
concerns of real typesetting.
Computer Driven Phototypesetting
The major text
for
the
Modern
typesetting
by
Reports" fame).
material
from
E!9.!!i~IE~!!..!.
Seybold
Fundamentals
(of
Concepts in this text are
Electronic
Market 7
by
published
John
is
In
the
of
"Seybold
augmented
with
and
addition,
Information
progranming
International,
manuals
Inc.
are
referenced when dealing with typesetting programs.
The sequence of the Seybold book is followed
closely.
The
first six chapters deal with the history of
printing as well as the various characteristics of
These
chapters are important in various ways:
this facet of text processing
aware
of
printing,
the
complications
whether
mechanically.
The
within the field.
the
relations
it
enn.:
is
main
student
inherent
created
in
"type".
in studying
must
be
made
dealing
with
electro-optically
or
complication is non-standardness
are
relative
(and
are not necessarily transitive.) Aside from
Seybold
7 Electronic
PF~g d~~~~!~l~ ~Tf MQ~~Tr~ g~~QQ~i!iQ~
and
1
uu ICaLIOns,
Egui~ment-JmarKef--(New--york:
rgvvy~---
the
Almost all measures
-(lV_I_d-~---P6 J~h~-Seybold,
e 1a,
somewhat
------
nc.,
v77.,
Electro-optical
Publishing
___ Frosr--ana-Sullivan;-rnc~,
12
exposure to inherent
from
a
survey
complications,
of
the
methods
history of typesetting.
character
on
a
student
employed
profits
throughout the
For instance, the idea of a "meta"
is introduced in the 1930's for shifting 'rails'
paper-tape-driven
1880's),
the
the
linecaster.
inventers
of
Even
earlier
(the
Monotype developed the system
used even today in measuring characters as values
relative
to the widest character in the font.
Therefore, the first nine chapters
assigned
and
discussed.
current typesetters.
the
logic
the
book
are
This leads the discussion up to
The other
chapters
covered
include
of hyphenation and the specification and actual
programming of composition languages.
some
of
As mentioned
above,
material on the latter topics are derived from actual
composition 1 anguage s of twar e at I I I.
Office Automation
The relationship of composition to word processing and
office
automation
is explored, followed by a proper study
of office automation and
these
electronic
mail.
Material
for
topics is drawn not only from Seybold, but also from
reports from various researchers and
Stanford
Research
detail the
progression
Institute
beginning
from
the
of
and
companies
Datapro.
electronic
mail
as
Xerox,
These reports
as
a
logical
network technology developed in the
form of the ALOHANET and the ARPANET.
The Realities
In this second part of the paper the main notes to the
class
are
detailed
along with a discussion of the actual
progress of the class.
It is here that
the
discovery
is
made that concepts clear to one with prior understanding or
experience
are
not
intimation
alone.
made
familiar
to
the
novice
by
I hope that statement becomes clear as
we progress ••.
13
TECO
The introduction to TECO was
example.
a
accomplished
~!~~~~
Such conventions as how to use the
delimeter
and
invoker)
command
and
representation (a dollar-sign) are given.
visual
Four main genres
Pointer manipulation, Deletion and Search.
Type-out
and
An example page
of FORTRAN code used for demonstrating these
xerographed
by
key (as
its
Insertion,
of command type are then given:
mainly
commands
was
and distributed to the class as well as a copy
of the TECO pocket guide (a mini reference manual).
The lecture content up to this point did not
challenge
the
class,
simple commands.
a
It was after the introduction of TECO
"programning"
to
as it appeared to be an editor with
language
some
that
interest
demonstrated in the form of probing questions
to
seem
as
was
(as
opposed
"Is it true that this effects .•. ? 11 questions).
TECO as
a text processing language is
seen.
Given
enough
the
insight
powerful
I
have
and/or patience, almost any
(N~te
task can be accomplished.
most
that I didn't say
g~l~~lY
--TECO is an interpreter.) For instance, TECO will return a
flag as to whether a file exists on the disc and
be
persuaded
to
throughly
However, it would have taken
teach
language than was time for.
challenging
powerful
all
the
much
capabilities of the
Therefore, with a
simple
but
assignment in mind, I covered many of the more
constructs
execution).
can
to use that file as a data file or as a TECO
program to be executed.
longer
then
(including
macro
The pith of this follows.
14
generation
and
15
TECO is a character-oriented editor and the
II
.
signifies
II
is a place
integers
of
and
the current cursor address.
storage
and
It
text.
can
may
store
act
character
A "Q-register"
two
data
types:
as holding either, but
When
cannot be thought of as holding both simultaneously.
used
for
storing an integer value, it is loaded with that
value by preceding the letter "U" with an
evaluates
to
For
integer.
an
expression
.U1
instance,
that
loads
Q-register "1" with the address of the cursor.
Q-register
names are any digit or letter of the alphabet.
In the same
manner OU1 zeros Q-register 1.
use
with
Q-registers
are
Other numeric operations of
%and=.
The percent operator
increments the following expression while = types the value
of
an
expression
out
to
the
Thus %A would
terminal.
increment the contents of Q-register "A" and QA= would type
out the value of Q-register "A".
value within Q-register "A".
text.
One
easy
A simple "QA" returns the
Q-registers
nX*,
where
n
is
the
(beginning with the
current
operator,
is
and
"*"
a
be
stored
the
text
hold
and
away
and
number of lines to be stored
cursor
position)
Q-register
name.
retrieve this text, position the cursor at
want
also
way to load one with text is to position
the cursor "in front of" the text to
type
can
X
is
the
In order to
the
point
you
type G* where * is a Q-register name.
The contents of the Q-register will be inserted between the
two
characters
addressed
address may be thought of
by
as
the
"how
cursor
many
(the
cursor's
characters
that
precede the cursor in the buffer", and the cursor is always
"in between" characters).
If a G command is issued and the
16
buffer
an
has
integer
stored in it, an error message is
generated.
Another
expression.
This returns the equivalent ASCII code (in the
radix,
prevailing
character
in
the
item
of
interest
is
decimal)
of
the
the
"nA"
default
is
.+n+1th
buffer.
For· example, a "0A'' return the
character code of the character immediately
following
the
cursor,
value
the
while
a
"-1A"
will
return
the
character immediately preceding the cursor.
two
ways
way,
is
to
create
somewhat
algorithmic
equivalent
language.
"command loop".
forms.
"macros" in TECO.
The
to
a
of
Now, there are
A "macro", by the
procedure
in
an
The first appears in the form of a
This, in turn, is usually employed in
first
is
easier:
two
n(command string).
This
syntax (including the braces) causes the string of commands
within the braces to be executed "n" times.
Unfortunately,
braces won't print on the printer used for this
parentheses>
of
this
strings
are
is:
are
string1 is
argument
used in their stead.
commands
with
and
The
hopefully
generates
After string2
executed, control is passed to string1 again.
the numeric argument evalutes to
to as a
"conditional
semi-colon
passed to string3.
command
a
numeric
The semicolon causes control
to pass to string2 if n negative.
is
three
the following interpretation:
the semi-colon.
control
a
has
been
In the event
non-negative
integer,
This loop is then referred
loop"
(by
virtue
of
the
evaluation.) Now, if a search (for a string, of
course) fails while in a command loop, it generates a
for
so
A more general form
(string1 n; string2)string3.
executed
to
paper,
use by a semicolon that may follow it;
zero
or a -1 if the
17
search
So
succeeds.
occurrance
of
the
<sFOO$; Ott>
string
type out the line on which
command
loop
wi 11
search
"FOO" If it finds one, it will
it
is
found,
and
begin
if not, it exits the loop.
again;
an
for
the
Command
loops may be nested.
The other form of macro looks more like a program:
!TAG!
anything between exclams is a
label or comment
Otag$
unconditional go-to; transfers
control to label !tag!
exp"#string1'string2
Expression exp is evaluated.
# is an L, E, N, G. So if
# is L, strin~1 is executed,
then string2 1f ex is lessthan zero.
-- -- -- ----E-=-equaT to zero, N = non zero
G = greater than zero.
If condition not met, control
is passed immediately to '
~!~~E.!.~
Macro
Body
Running
Comments
!START!
-10u1
!LOOP!c
%1
We start the macro here.
Load Q-reg "1" with a -10.
"c" moves the cursor up one
character and the %1 increments the contents of Q-reg "1".
The contents of Q-reg "1" are
used in the " evaluation.
Spaces and carriage-returns are
ignored in the macro body.
Q1"L OLOOP$'
!DONE!
If the content of Q-reg "1" is less-than zero, control
returns
to
label
!LOOP!
command, otherwise
"Oloop$
command
(when
is
via the
contents
skipped
!OONE!
(a comment, but can be
simple
macro
has
the
effect
characters to the "right".
and
the
of
The same
II Qll
are
unconditional jump
.GE.
zero)
the
control passes to label
next
moving
macro
command).
This
the cursor ten
can
be
coded
18
using a conmand loop:
-10U1(0161Q1;)$.
Another useful variable is "Z".
the
"Z"
always
contains
address of the last character in the buffer.
In other
words, it is the number of characters in the buffer.
useful
constants
are
the ascii values of carriage-return
and linefeed, decimal 13 and
variable
can
be
Some
10,
useful
in
respectively.
terminating
The
"Z"
macros.
For
instance, (C.-Z;) will move the cursor "to the right" until
it is moved past the last character in the buffer.
point, the loop is exited.
The
same
may
!LOOP! .-Z"E OFINISH$'COLOOP$!FINISH!.
be
At that
coded
as:
The class also was
told (in the first lecture) than the "L" conmand would move
the
cursor to the beginning of the next line.
i s on the l as t l i n e , t he
" L"
c onma n d
wi l l
If the user
p o s i t i on
t he
cursor past this line.
A macro may be executed by typing its body followed by
two
alt-modes.
prudent)
to
Since
develop
it
is
programs
usually
with
easier
and
(and more
editor,
then
"execute" it, another method for executing macros is given.
As
was
mentioned
Q-register
above,
with text.
it
is
possible
to
develop
a
a
A Q-register may then be "executed"
by typing Mq where q is a Q-register name.
may
"load"
Therefore,
we
macro in TECO's editing buffer then load a
Q-register with the macro body for later execution.
Macros
t hen
l o ad a
be c orne ,
Q-register
I!LOOP!
.-z
e f f ec t i ve l y ,
with
the
sub r o u t i n e s .
above
EOFINISH$'COLOOP$
cannot insert alt-modes
using
macro,
!FINISH!$$.
the
insert
So ,
we
to
just
type
Not quite.
command
We
since
19
alt-mode is the delimeter.
to type 271.
value
One way to insert an altmode is
This is a TECO command that inserts the ASCII
preceding
the
"I" into the buffer, and "27" is the
decimal ASCII value of an altmode.
Given this knowledge, the class
create
a
was
assigned
to
text file of ten lines or so using TECO, then 2)
to write a TECO macro to count the number of lines
editing
1)
They
buffer.
in
the
could write any kind of macro they
wished, so long as it worked.
I instructed them to get the
macro into the editing buffer in text form, and to have the
file of text (to be counted) available on
further
nausea
on
text
To
save
their part, I gave them an incantation
that would load their macro into a
the
disk.
Q-register,
file into the editing buffer.
on the hard copy to the terminal then
then
load
They were to turn
type
the
following
commands:
HT$$
HXA$
HK$
ERFOO$
Y$
HT$
MA$
!Type out macro body!
!Load Q-reg A with macro!
!Clear buffer!
!Find file FOO on dsk!
!Load file into buffer!
!Type contents of file!
!Execute macro in Q-reg "A"!
Their macro was to type out value
lines in the buffer.
of
the
One acceptable solution would be
Ou1
!zero out Q-reg 1 for counter
$ ;%1)
Q1=$$
!search for a literal carriage-return!
!by typing a carriage-return in the !
!search string. Increment Q-re~ 1 if !
!found, exit loop if search fails
<s
number
of
20
Other solutions could involve 1)
buffer
examining
each
Q-register on seeing
a
character
moving
and
carriage-return
through
the
incrementing
the
or
on
seeing
a
line-feed (but not incrementing on both), 2) moving through
the buffer with
the
versions
need
would
"L"
to
conmand.
detect
testing the cursor against "Z".
Both
the
of
the
latter
bottom of buffer by
r ·
The Editor Assignment
The TECO assignment acquainted the class with some
the
options
editor.
In
that
must
be
explaining
of
considered when specifying an
characteristics
the
this
of
editor/language alternative schemes of specifying TECO were
considered.
locks
it
The
into
fact
a
that
certain
it
is
character-addressable
cast already.
unique to other three other editors
One,
Characteristics
were
then
discussed.
which I wrote, is a display-oriented character editor
The
(a list of its commands appear in Appendix A).
two
are
essentially
line-editors.
One of the latter was
the SOS mentioned in the Objectives, the
written
other
other
an
editor
for
use under the UNIX system and is described in
detail in Software Tools. 1
The class was made aware the first day that they
to
write
class.
an
editor
as
part of the requirements for the
They were to hand in an external
the editor they would implement.
certain
minimum
preferably,
in
editing
PASCAL.
teach the
The
reason
class
manipulating text.
were
written
in
how
specification
capabilities,
The
of
The editor was to provide
and
requirement
PASCAL had its rational in staying
SNOBOL.
were
clear
written,
to write it in
of
FORTRAN
and
is
the writing of the editor was to
to
build
and
organize
This included searching.
SNOBOL,
the
task
of
tools
for
If the editor
implementing
a
searching routine and therefore understanding how searching
--------IK~~~i~han and Plauger, QQ~ £l!~' pp. 163-217 •.
21
22
is effected, would not be accomplished as the routines
part
of
the
kernel.
FORTRAN was foregone for reasons I
hope are obvious to the reader.
for data structuring.
to have a common
Thus
bugs
in
are
PASCAL has
more
affinity
SIMULA was foregone because I wanted
basis
for
discussing
coding
problems.
one compiler are worth discussing in class;
allowing two languages would, in my opinion, turn the class
into a programming languages class.
In covering the other editors, I emphasized that
convention
or feature the implementor allows in the editor
molds the other features as yet unspecified.
if
a
line
editor
is
For instance,
contemplated, the implementor must
anticipate the need to modify characters on a
He
is
faced
with
the
decision
of
given
of
those
characters
line.
forcing the user to
completely retype the line or of allowing the
only
each
modification
requiring modification.
If he
chooses to offer the latter capability,
specification
implementation
be consistent with
of
that
command
must
and
commands already specified.
During these discussions the class
in
exhibited
concern
that they had no idea as how to specify an editor:
first milestone assignment in the
exemplified
in
the
project.
So
the
questions
Objectives (concerning editor design)
were posed academically.
This discourse was interrupted by
a deep anxiety on the class's part about writing the editor
in PASCAL;
of
the
only two people in the class admitted knowledge
language.
I decided that I really wanted a common
basis of PASCAL in the project, so I agreed
to
deliver
a
23
few lectures by way of introducing it.
24
An Aside On PASCAL
In order to spare the class the expense of buying
many
books, a copy of
the library.
~~[!~~~~ !~~l~
was put on reserve in
I then strongly urged them to buy a
~~~g~~i~g i£ ~~~£~ll·
Grogono's
a
clear
treatment
valuable routines.
our
(CDC)
of
part,
Dispose
implementation
routine
then
routines that manage
structures).
In
it
in
was unsatisfactory.
Grogono provides a routine to test the Dispose in
implementation,
of
the language and examples of
For instance, the
particular
copy
This book was augmented
with examples of my own design, but for the most
gave
too
a
given
offers replacement Create and Return
unique
data
types
(usually
record
addition to the Grogono text, I informed
the class that there is a documentation file
on
NOS
that
outlines the peculiarities of the NOS PASCAL.
PASCAL was introduced to the class by
quickly
naming
the "gross" or categorical features of the language.
I.e.,
that the complete syntax of the language can be represented
on
two
pages
using
Wirth's
diagrams
as
well
as
the
following:
1.
All variables must be declared
2.
Keywords (reserved words) are
the diagrams
3.
The semicolon is a statement separator
--------lP~t~;- Grogono, Programming
Mass.:
capitalized
in
in PASCAL (Reading,
Addison-Wesley PuoTTsliTng-Co. ,-Tnc:--;-r978).
25
4.
There are four built-in data types
a) real
b) integer
c) boolean
d) char(acter)
5.
There are also four built-in data structures
a) array
b) record
c) set
d) file (sequential in standard PASCAL)
6.
There is facility for user-defined data types.
Example:
Type
iodevice
= (diskdevice,
printdevice, readdevice, drumdevice)
The four scalars are the only values the type
iodevice can assume.
7.
There is dynamic stora~e allocation (but not
for arrays unless defined as a component of a
record).
8.
Associated
variables.
9.
Labels are
declared.
with
records
unsigned
are
integers
typed
and
pointer
must
be
10.
There is a CASE statement (but has no "others"
label).
11.
The
FOR
FOR
The
12.
Procedures
recursive.
13.
Can call by reference or by value.
14.
All blocks are named.
15.
The boolean operators are AND, OR, and NOT.
FOR statement can take one of two forms:
id := exp1 TO exp2 DO statement
id := exp1 DOWNTO exp2 DO statement
implica,tion is the "step" is always "1".
and
Functions
are
automatically
The remainder of the PASCAL lecturing was divided into
two
genres:
providing
in-depth
examples
of
the above
features and pointing up caveats involved in using the
PASCAL.
Most
of
the
involved familiarizing
time
the
in
class
NOS
covering the first genre
with
the
record
type
definition facility and set definition and manipulation.
I
26
began the record definition material
setting
up doubly-linked lists:
The data part of the node was
This
was
first
to
opportunity
progrrumming level of the class.
TECO
an
example
of
one record type defined a
node.
the
with
be
a
character.
I
had
to
guage
the
They
had
performed
the
assignment quite well and I am not sure as to whether
it was due to cooperation among members of the class or
hints
my
in delivering the TECO material and assignment.
in dealing with the list
asked
me
to
explain
example,
what
a
a
number
of
students
linked-list was.
This, of
course, led to a short treatement of linked-lists.
PASCAL
handles
file
I/0
was
thought the class needed some
PASCAL.
So,
in
concert
But
The way
then covered in detail.
sort
with
of
the
spring-board
lecture,
I
into
I wrote an
example program using a PASCAL implemented in Hamburg, West
Germany
that
runs on a DECSYSTEM-10.
The program appears
in Appendix B.
The program
was
examined
in
class.
The
"program
declaration" heading differs from CDC's, and was pointed up
in class.
Before
characteristics
any
assignment
unique
to
text files were discussed.
with
I/0
in
CDC's
was
made,
covered
implementation.
It seems NOS interferes
the following way:
First,
a
bit
any line is stored an an
integral number of words at ten characters per
an
we
word;
and
end-of-line condition is represented as 12 bits of zero
(two six-bit quantities).
positions
Now for
the
catch:
character
between the last non-blank character on the line
and the 12 bits of zero that represent the end-of-line
are
27
padded
out
with
blanks.
means the user can expect
before
detecting
the
Under
up
to
certain conditions this
eight
end-of-line!
blank
characters
Second,
the
string
delimeter characters in the CDC implementation are #'s, not
apostrophes.
Third,
variable is not an
changes
are
not
the character for defining a pointer
up-arrow,
out
of
an
apostrophe.
There are only 64
Appendix C.
it.
The
character
set
is
given
in
Fourth, as mentioned above, the CASE statement
has no "others" clause, and the compiler is
that
characters
under the NOS timesharing subsystem-- and there
is no up-arrow in
in
These
capriciousness, but result from
CDC's limited character set.
available
but
the
does not match
not
forgiving
program will abort if one of the CASE labels
the
selector.
While
this
seems
to
be
"standard"
PASCAL,· it is prudent to note that most, if not
all,
statements
CASE
should
be
enclosed
in
an
"IF"
statement for safety.
The thought was raised in class that there is
another
(simpler?) way to read a string of characters into a linked
list.
One appends each character to the beginning
list
of
the
(thus not requiring a pointer to the last node of the
list).
When the string is completely read in,
one
simply
reverses the list.
The
completing
class
an
was
offered
its
choice
a
another list.
methods
in
assignment to read the entire contents of a
file into memory in the form of linked
represents
of
line
lists.
Each
list
and the lines are to be connected with
Thus we have a linked
list
of
text
lines
f
'
28
where
each
line
characters.
is
represented
Then the "buffer"
is
by
a
linked
to
be
written
list
of
to
an
output file.
The motivation for making
twofold:
the
above
assignment
was
to familiarize the class with the NOS PASCAL and
to give them a kernel from which to develop the rest of the
editor.
In
completing
the assignment, the class had its
input and exit routines.
By this time it was becomming clear
might
be
implemented
representation.
discussed
I
The possibility
but
consider
the
editor
linked-lists
for
buffer
of
using
an
array
Q-register-type
store
text)
a
facility
(somewhere
when
to
the
conceived as lists than as one long array that
is not dynamic.
Thus a Q-register
is
lists rather than managing free "core".
loaded
by
of
getting
"Disposing" it.
available memory.
copying
With this in mind,
I tested the NOS PASCAL resident Dispose routine.
consisted
to
basic component of an editor.
The task of memory management seemed less imposing
class
was
not encouraged much on the following basis.
a
temporarily
using
that
The test
an instance of a PASCAL record then
The program aborted due to
exhaustion
of
This is clearly not the action one would
desire.
Therefore, I mentioned to the class that we should
define
a
separate
New
and
Dispose
(under
identifiers) for each record type that is used
by
the program;
different
dynamically
because, among other reasons, the pointer
variables receiving pointers to these records are
typed.
The
assignment
strongly
was trivial as the procedures are
29
given, as mentioned above, in the Grogono text.
Perhaps the last major topic covered under PASCAL
definition
and use of sets.
We decided it would be useful
t o emp 1 o y t he s e t type i n i den ti f y i n g char a c t e r s
from the terminal.
well as "alpha".
allow
the
was
r e t r i e v ed
Thus a set "digit" could be defined, as
Unfortunately the
declaration
"TYPE
CH
results from the convention of
strings on 60-bit words.
NOS
PASCAL
= SET
OF
does
not
CHAR".
implementing
sets
This
as
bit
Only one word is used and, sadly,
there are 64 characters in the timesharing
character
set.
Therefore, as "illumination" to a lecture on sets and their
manipulation, I wrote and ·handed-out a program that defines
and employs sets;
it appears in Appendix D.
In discussing features of PASCAL with the class, I was
able
to
fit
examples to the discussion at hand, that is,
text editors.
specifying
So
when
returned
our
discussion
them
the
but
not
of
specifying
one.
and they unanamously chose the latter.
appears to have been the better choice.
class
user"
masse
In retrospect, this
I was able to turn
into more of a seminar than a lecture (at least
for awhile) by continually asking what may be
naive
I
choice of specifying the editor each of
them would eventually implement to specifying one en
the
to
an editor the class had gotten over some of its
fears of writing one -offered
we
questions.
termed
"the
For instance, the user would like
to see, from time to time, the current state of all or.part
of
his
editing
buffer.
The
class
considered from the
designer's vantage the implications of allowing the user to
30
specify
"type-out"
in
absolute and/or relative commands.
The same question appeared in
lines.
When
we
tried
terms
of
the
deletion
for a plurality in specifying the
action of the insert command (and the form it should
the
class
became
embroiled
"controversy".
As
continually
think)
(I
the
reminded
I
them that each syntactic
the
as-yet-to-be-specified
syntax;
and
that
definitions
begged the definition of others.
commands
was decided that the user
only.
1 ines?
of
progressed,
that the user must be spared the complication of
non-symmetric
lines
take)
in what can only be termed a
specification
construct implied the form
commands;
of
So
could
insert
some
For example, it
or
delete
whole
how was the user to "merge" two adjacent
The "merge" command was thus born:
"deleted"
of
it
essentially
the conceptual end-of-line of the first line and
made the following line
part
of
the
first.
The
class
responded immediately that we needed to specify an opposing
command;
one that could split
user-specified
point
one
a
line
command
into
two
that
I
at
a
had
not
command
per
anticipated.
The subject of allowing
user-input
line
restrictions.
statement
more
than
one
was rejected as a requirement due to time
This
allowed
the
class
to
use
CASE
for a control structure to the editor (by having
a set of legal commands, then dispatching within
statement).
that
the
case
Also, a method for saving and retrieving text
in Q-registers was discussed.
routine
a
replicated
the
One simply
defined
a
copy
line and character nodes of
31
desired lines (for saving) and the same for retrieving (new
nodes are obtained, not pointers to data in the buffer).
Later in the semester I handed
reiterated
the
longer
to create it.
than
assignments.
"Q"
document
a
few
As
and
the
RUNOFF
source
that
It
file
One student requested that I give them
weeks
the
only
editor, I assigned all the
"Change",
a
consensus on the editor specification.
appears in Appendix E along with
used
out
to
complete
assignment
capabilities
progrrumming
pending
except
was
the
"Search",
"G" to be completed 4 weeks before the
final, with the balance due a week before the final.
Pattern Matching
The editor discussion was ended
In
searching ,techniques.
simple
searching
Sahn i.
However,
by
a
discussion
particular,
I
described
the
presented
by
Horrowitz
and
algorithm
I
had
of
meant to allow this algorithm as
sufficient for our editor and therefore decided code it and
test
it in PASCAL.
I thought the rest of the editor would
be challenging enough (for one semester's work).
I
coded
it exactly the way I found it in the book and it terminated
with an "invalid pointer reference".
NIL
pointer
value.
Upon
This usually means
investigation,
a
I discovered I
could not use the following construct:
WHILE (NOT(P=NIL) AND NOT(Q=NIL) AND (P'.DATA=Q'.DATA)) DO
BEGIN P:=P'.NEXT; Q:=Q'.NEXT END;
It seems that PASCAL evaluates
even
if
P=NIL or Q=NILL.
the
whole
predicate,
Thus the (P'.DATA=Q'.DATA) will
eventually always attempt to utilize a NIL pointer.
I left
that particular code within a comment in my program FIND in
Appendix F for illustration (and in case the above
is
vague).
I
found that I had to write some "ineligant"
code in order to get around this shortcomming.
seen
context
As
may
be
upon inspection, I attempted to remain as faithful as
possible to the original algorithm, but
my introducing another variable OLDP.
the value of P (which points into
the
this
necessitated
This is used to save
target
string)
in
anticipation of P being set to nil in the event the pattern
and target strings are of
pointer would be lost.
equal
length.
Otherwise,
the
Other "ineligance" is introduced in
order to ensure termination of the first loop.
32
33
As mentioned above in passing, this procedure is given
in
its entirety in Appendix F.
pattern and the target
algorithm
(for
string.
It assumes two lists:
I
decided
to
Appendix G and is called FINDA.
This appears
This algorithm differs
in one important way from FIND (because the pattern
an
array}.
is
in
We find that yet another piece of information
must be supplied-- the length of the pattern.
becomes
the
illustrative purposes) so as to assume the
pattern to be in an array rather than a list.
in
code
the
obvious
The
reason
when we consider that with such a limited
character set, there is no convenient character to use as a
delimiter.
We can't use a zero or any other non-character
data-type as the array is defined as consisting of elements
of
type
"CHAR",
and
character in the
pattern.
array
PASCAL
may
enforces
be
this.
considered
part
Thus, any
of
the
Therefore, actual length of the string is passed
as a parameter
to
FINDA.
In
other
respects
FINDA
is
In ~~~Q~~~~!~l! of Q~!~ ~!I~~!~I~!' algorithm FIND
is
similar to FIND (and in fact, simpler}.
followed
by
another algorithm NFIND which exhibits better
characteristics (in terms of number of characters
to
achieve
a
match}.
examined
As mentioned in the Objectives the
algorithm compares the corresponding last character in
strings
before
starting the actual loop.
the
Of course, this
algorithm can exhibit behavior as bad as FIND, and this was
mentioned to the class.
34
The
TECO
emphasis
on
algorithm
the
was
preprocessing
achieve generality and speed.
pattern
string
characters.
discussed
of
next,
the
with
an
data in order to
The user is able to build
that allows specification of special match
In the following table a control character
as
represented
two
characters,
followed
a
example,
the
match
a control "X" is represented as 'X.
characters
consist
of
a
control
is
by
the
character to be struck while holding down the control
For
a
key.
Some of
character
followed by another character, a few follow:
Function
accept any-cnaracfer in this position
accept any separater. (punctuation,
tabs and spaces)
accept any alpha (of the 52 upper+lower)
accept any digit
accept any non-null string of spaces
accept any of the characters specified
accept any char except the following
Character
Tx------'s
'EA
'ED
'ES
'E chl, •• ,chn
'N
The control-N needs explanation.
modifier
to
a
match specification.
search string were
meaning
~~1
'N'EA
(control-N,
It
is
used
as
a
For instance, if the
control-E,
A)
the
is "aceept any character in this position if it is
an alpha character.
staged.
In
the
The algorithm is conceptually
first
stage
the
pattern
"compiled";
in the second, the pattern array is
determining
matches.
two-dimensional.
position
The
array
may
be
The columns correspond to
array
used
thought
the
two
is
for
of as
character
in the pattern, while the rows determine matching
status.
The depth of a column is set to 128 (corresponding
to
cardinality of the character set).
the
In this stage,
the array is set to zero then the first character
is
"set".
A
position
string-specifying character is read and the
35
appropriate cells of that column are exclusive-or'd with
'true'.
In
non-case-sensitive
mode, say an 'a' is read.
The ASCII value of the character is used as an
the
a
column for that character position.
index
into
The program notes
that the character was alphabetic and so subtracts an octal
40
from
its value, thus making it upper-case.
is also used
elements
as
an
index
into
column
This value
one.
Thus
of the column have been set.to true.
two
The program
then increments the column index and the second position in
the
pattern
string
is
set.
For an example of a special
match-control character (from the table above), let's say a
control-X
is
read.
The control-X, as may be remembered,
specifies that any character found in this position of
search
string is accepted as a match.
in column two
operator.
are
set
true
Thus, all the cells
using
the
exclusive-OR
if
the
control-D, we would have
set
only
values
the
ASCI I values of zero through
nine.
Obviously,
to
the
corresponded
to
character
the
cells
been
whose
a
row
Using this mechanism, the control-N operator becomes
trivial.
Upon
reading
a
control-N
in
setting
pattern string, all the cells in the column
the
had
inclusive-OR operator.
are
up the
set
with
After setting these cells, the
column number is not advanced.
Therefore, when
any
other
arbitrarily complex specifier is encountered, the action of
that specifier is to turn off those bits
with
well.
effect
with
an exclusive-OR.
it
is
concerned
Our first example would serve
If the 'a' had been read following a control-N,
would
have
been
to
set
to
the
false the two cells
corresponding to upper and lower case 'a'.
36
The second stage of
The
above.
algorithm
the
algorithm
attempts
to
match
"string" with characters in the buffer.
found,
the
resembles
If
the
no
"FIND"
pattern
match
is
buffer pointer is moved only one character and
,q
the match is tried again.
a
very
few
However, this is carried out
instructions as the algorithm is written in a
powerful assembly language.
found
to
A character in the
the
buffer
The
instruction
accumulator as an index register into the row in
the pattern table corresponding to the ASCII value
character in the accumulator.
of
the
buffer
if the cell is non-zero (i.e., the characters matched).
the characters did not match, the process is repeated
character
the
It skips if the cell is zero
and returns to pick up the next character from
the
is
match by loading it into an accumulator then the
determination is made in one instruction.
uses
in
If
with
following the first character in the target
string as the new beginning of the target string.
Unfortunately, after
there
was
insufficient
the
time
TECO
to
search
cover
algorithms mentioned in the Objectives.
formatting had come.
was
covered,
any of the other
The time for
Text
, Formatting
Two format programs are discussed:
Tools
and
the
other,
RUNOFF.
modeled on RUNOFF, but is written
system.
one from
Software
Actually, the former was
in
RATFOR
for
a
UNIX
Most of our discussions concerned the former, but
RUNOFF was appealed to for justification.
by
Kernighan
and
purpose of a formatter is to
neatly format a document that is pleasing to the eye 1 • The
" p 1e as i n g
t o t he eye" i s mi n e and imp 1 i e s a mi n i rna 1 de g r e e
of page layout.
Plauger,
As mentioned
the
The commands implemented in this formatter
are characters in length that always follow a period.
Some
representative commands follow:
Command
Function
toiiTii-T5 "fill" mode
turns off "fill" mode
force out current output line
and start new line
produces a blank line
sets spacing on the page
"begin page" -- skips to top
of new page, numbers it "n"
center the next "n" lines
underline the next "n" lines
sets left margin
sets right margin
do temporary indent of
"n" spaces
-:-rr--.nf
.br
.sp
.ls n
.bp n
.ce
. ul
. in
. rm
.ti
n
n
n
n
n
Most of the commands are
setting
of
margins, etc.).
specifying header and
head"
for
"status
setting"
{the
There is also a provision for
footer
lines
{meaning
a
"running
{foot) that is put at the top (bottom) of each page.
The treatment of this genre is quite good, in that the text
is
exemplifying
therefore
shows
prograrrming
the
by top-down methodology.
development
37
of
some
of
the
It
more
38
important routines through up
four
generations.
that i t detects whether we are at the top of
past
puts out the current line;
the
footer
then a
"buffer"
is
is
This
invoked.
typesetting programs, as
justifier
is
one
or
made
to
see
One level down, we
more
similar
will
If so,
word
the
structure
be
into
the
justification
is
later
similar
discussed.
to
The
to typesetting programs in that all
the "spacing" is done before any of
That
page
responsible for detecting whether the
word will overflow the measure.
routine
is
test
should be output.
find that the routine that puts
output
the
bottom (in which case the header is printed) and
the
whether
The
code is encapsulated in the output routine in
layout
page
to
the
line
is
output.
is to say, the output buffer is an array within which
the text is shuffled until it is justified "in
afterwards
core",
and
the contents of the array are output (up to the
"linewidth").
Another useful concept pointed up is that of
distinguishing between the number of printing characters in
the output buffer as opposed to the number
The
non-printing
characters
be
typesetting
characters.
backspaces
underlining) or
later,
characters
controlling the typesetter.
for
in
may
of
programs,
(for
"meta"
Thus the user
should be sure the total number of characters in his output
buffer does not overflow the buffer length.
The
concept
of
formatting
programs.
called
another
by
formatter.
Its
the
"break"
is
central
to
text
A break may be issued explicitly or
of
the
routines
that
comprise
the
action is to force the output of the line
39
in the
output
discovering
a
justification.
and
in
without
"begin
page" or "space n lines" command, a
call to the break routine is made to
text
Thus,
buffer
finish
the
previous
insure the text following appears in the correct
place on the document.
It was mentioned
accepts
above
the
underline
command
a numeric argument (corresponding to the number of
lines to be underlined).
just
that
a
Also, if we wanted
to
underline
few words of a line, we would put these words on a
separate line.
Since
the
"underline"
command
does
not
cause a break, the words on that separate input line appear
on
the
same
line
implementation
is
as
the
text
encompassing
straightforward:
it.
Its
immediately after the
underline command is parsed, the next line is read into the
input buffer.
Each character of this buffer is copied into
a temporary buffer -- but with a
character
word.
the
following
each
backspace
character
of
and
underline
the input buffer
Then the whole temporary buffer is copied back
input
buffer
and
the
words
into
are sent to the output
buffer by the normal operation of the program.
The centering routine is even simpler:
a
variable
"tival"
it simply sets
that is sensed by the output routine.
Tival is mnemonic for "temporary indent",
and
routine
of spaces before
simply
outputs
outputting the line.
"tival"
number
the
output
40
Probably the main drawback I found with
the
algorithm for justification is that it is tricky and
Tools
elaborate (the word "tricky"
None
the
less,
is
their
own
description).
the algorithm was discussed.
afterward, however, I presented the algorithm
uses,
which
former
in
that
"justifying" spaces to the output line as
£~!~~!(i.e.,
line
Immediately
that
RUNOFF
seems, to me at least, more accessible.
algorithm differs from the
the
Software
it
That
adds
the
the!!.!!.~!.~ ~~!.!!.K
On
sent to the output buffer).
discovering
to be full, RUNOFF calculates two values "EXSPl"
and "EXSP2".
The first is the number of spaces to be added
~~~£~Q!.!l£~~!!Y
to each interword space in the line, while
the latter's interpretation is contingent on whether we are
weighting
the
"white
space"
to the left or right of the
line.
That is, the number of extra spaces
is
added
to the right or left of the output lines in order to
avoid "rivers" -- gaps of white space on one
page
or
other.
So,
side
add
a
words.
Thus
If
or
"real"
space
extra
spaces
to the right.
we
are
a
of
space
on
the
the
biasing
Therefore an extra space will be
added between words 7 and 8, 8 and 9 and 9 and 10.
diagram
to
we need to add three spaces to the line to make
it justify, exsp2 will be equal to six if
the
space
if we have a line with ten words on
it, there are 9 interword gaps
1 i ne.
the
output space, otherwise it is a count of the number
of interword gaps to pass up before adding an
between
of
if we are biasing the spaces to the
left, EXSP2 is the number of times we
"real"
alternately
A
flow
algorithm that was presented to the class
appears in Appendix H.
41
At this point the class was given a midterm.
of
The text
both the midterm and the final appear in Appendix I.
I
considered it a somewhat easy test, but then again, I wrote
it.
The
results
were
a
bit surprising to me in that I
thought people were absorbing the material better than
demonstrated.
There were excellent scores:
high 80 and 90 per cent
range.
The
scores were 97% and 28% respectively.
highest
was
5 in the very
and
lowest
Typesetting 1
This
section
typesetting
methods.
were
concerns
the
of
business
-- by both "early machine" and phototypesetter
The corresponding chapters covered
1-9,
teaching
12-15
and
in
the
book
The content of other chapters
21.
(that deal with word processing) are dealt with using other
sources.
The area
difference
is
introduced
between
with
typewriters
a
discussion
and typesetters.
Executive is an example of "cold type" in that no
cast.
If
any
plates
are
made,
it
is
a
of
the
The IBM
lead
is
result
of
photographing a typewritten page produced by the Executive,
then
making
offset
plates from that.
The Executive also
introduces the student to the idea of proportional spacing.
This
the
is
case
in
which
proportionately different for a
an "1".
the
C~?ital
machine
"M", say, than
for
The concern here is for the appearance of the text
type is traditionally designed so that
are
"escapes"
naturally
wider
than
others
some
characters
and the Executive is a
compromise between a typewritter and a typesetter.
Since this was an introduction phase,
its
use
is
also demonstrated (verbally).
device for locking rows of
type
close
the
chase
and
The chase is a
together
for
the
purpose of either making plates or printing directly onto a
42
43
,,
In such a device, all the lines are required
page.
of
the
same
length.
Therefore,
added.
are
From
this
came
the very right edge of the chase.
(called
lead
concept
the
justifying lines, where the last word of the
at
be
a line is not wide
if
enough to fit in the chase tighty, pieces of
quads)
to
line
of
appears
This is accomplished
by adding pieces of metal between the words (and
sometimes
between the letters), and is called "inter word" and "inter
letter"
The
spacing.
comparison
between
introduction
the
includes
linecasting
short
a
machines
and
the
Monotype, but I will postphone such a discussion until they
are treated at length.
Probably
the
most
obvious
difference
between
typesetters and typewriters is that of character repetoire.
For example, a typewriter has only a hyphen
it
character
and
must fulfil the role of the n-dash (used for separating
say, dates:
Another
1888-1960) and them-dash--used for
difference is that of achieving justification more
smoothly than a mono-spaced font
device
emphasis.
can.
On
a
mono-spaced
like a typewriter, justification cannot be achieved
by distributing small increments of space among each of the
inter-word
spaces
increment
printing
are
is
attributes:
the
ease
the
because
same
concerned
the
1
with
smallest
unit.
Thus
word
largest
traditional
the aesthetics and two other
readability and legibilityi.
Readability
is
of reading a printed page (the type arrangement)
and legibility refers to the speed with which
or
and
can
be
recognized
(the
type
each
letter
design).
And of
44
course, a printer also has bold and italic
fonts
to
draw
upon in order to draw attention to specific phrases.
For purposes of discussion, it should
there
are
two
mortise.
to
noted
that
72 "points" to the inch and one "pica" is equal
to 12 points.
type,
be
In detailing some of the characteristics
terms
come
to
mind:
the ability to kern and
Kerning is the process of causing two
"overhang" one another.
characters
For instance, if we wanted the
cross of a capital "Q" to underline the next character,
could
the
cross
Mortising
is
so
overhangs the body of lead it is cast on.
Thus any character that follows will fit under
another
aesthetic
nicety
the
cross.
wherein
characters that normally would be spaced too far
they
we
put both of the characters on the same piece of lead
(and thus creating a ligature) or we could cast the "Q"
that
of
two
apart
if
were to "sit" on their respective bodies are "pulled"
together by physically sawing off parts of
bodies.
This
each
of
their
could be desired in the case of a capital V
followed by a capital A.
-------------***-----------------------------------*** t--- ASCENDER
***
-------------***-------------------------***-------*** *****
*** *****
****
****
*
****
*
*
*
X-HEIGHT --:t ***
*
*
***
*
*
***
*
*
*
***
*
***
*
*
***
*
*
****
*
*
*
****
*
*** ****
****
*** ****
base line----***-------------------------***-------***
DESCENDER -----------~ ***
***
-----------------------------------------***--------
45
Referring to the above figure, the point size of
is
generally
given
as
the height of the body, where the
body is the distance from the top of the ascenders
type
type
of
to the the bottom of the descender of the type.
one
font
Note
that a lower
~ase
"l" may be larger
lower
"l"
of another font of the same "point size".
case
in
the
than
This may occur because the lower case l "sits" on the
a
base
line -- and the base line in one font may be in a different
relative position than
position
the
base
line
of
another.
of the base line is determined by the x-height of
the font, where the x-height is the height of a
x
(it
has
no
ascenders
x-height of a font,
the
or descenders).
shorter
its
descender
capital
base-line also;
so some, in turn, are larger in
letters
are
another of the same point-size.
lower-case
The larger the
Unfortunately,
than
The
can
be.
on
the
dependent
one
font
This in turn creates
_,
the problem of getting characters of various fonts to "base
aiign", meaning they all sit on the same imaginary line.
This brings up the topic of leading.
distance
between
"solid setting".
space
added
the
base
lines
spaces
between
rows of type.
out
is
the
running text, in a
A solid setting implies there is no extra
consecutive lines of text.
·leading derives from the days in
were
of
Leading
by
which
The term
consecutive
lines
inserting strips of lead between the
It is used to make a page more legible;
the
useage is "11 on 13" where "11" refers to the point-size of
the type and "13" refers to
base-lines.
the
actual
distance
between
This implies there is "two points of leading"
46
between the lines.
Another handy term of measurement is the idea
"em"
space.
fact
the
It is the amount of horizontal space that is
equal to the point size one is setting in.
the
of
It derives from
that the capital M was traditionally square, and
that it is the widest character of the font (normally).
"en"
space
space.
is
either
1/2
an
"em" space or 1/3 an "em"
The "em" concept is very handy in
specifies
a
relative
width.
that
the
em-space
(to
always
be
defined
is the widest character in the font,
and all the other characters are defined in
em-space.
it
When using typesetters that
use a relative-width system of escapement
later)
An
terms
of
the
Thus if there are 18 relative units, an em-space
is 18 units wide, while an en-space would be 9 units (if it
is defined as 1/2 em).
If that were not
"set"
should
help
confusing
us along.
enough,
the
concept
of
It has two definitions.
In
terms of the design of the original type,
case
when
it
defines
the
the width of the em-space has been narrpowed or
widened (the height remains the same).
Thus if a character
is 6pt/8set it means it is 6 points high and 8 points wide.
8pt/6set implies the opposite.
types e t t e r ,
subtract
achieve
width
from
a
of
it
is
the
2)
In
the
context
of
a
t he amo u n t of " s i de bear i n g" t o add o r
character's
normal
width.
Thus
we
6pt/8set appearance by adding two points to the
each
character
that
is
set.
With
photocomposition device, the amount can be negative.
a
-
- - ---- --
-
- -- -
-....-- -----
---~------~---
------~----------
I
First Generation Typesetters
Seybold concerns himself with two genres of typesetter
in
t hi s
c ·a t ego r y :
typesetters.
1 i n e cas t e r s
and
Mono type-s t y 1 e
The Linotype is covered first.
The Linotype is still used in the United
States,
and
has was the most popular style of typesetter throughout the
first half of the Twentieth Century.
hand-composition are significant.
Its
advantages
over
First, a printer can set
over 6000 characters per hour on the Linotype as opposed to
2000
characters
per
hour by hand.
Second, the matricies
(type bodies) are automatically recirculated back into
machine's
matrix
magazine (this implies sorting them into
the proper positions).
composes
slug.
be
a
line
at
The Linotype is so nruned because it
a
time, the lead "line" is called a
The slugs are "recirculatable" also in that they can
re-melted
for
reuse.
Third, justification is simple:
when the operator strikes the space bar, a "spacing"
is dropped into the setting channel.
be
wedge
When the line becomes
within "justification range" (the point at which
may
the
the
line
justified aesthetically), the operator may command
the machine to "set" the line.
At this point
the
machine
pushes the spacing wedges causing them to become wider (the
wedge consists of two "wedges" that,
each
when
other, form two parallel planes).
thus force the words apart with an equal
between
against
The spacing wedges
amount
of
space
adjacent words, until the line is spaced apart the
width of the measure.
injects
placed
Immediately afterward,
the
machine
lead on top of the matrices to create a lead block
47
•
48
of type.
When a hand-compositor tries to achieve the
effect,
he
must
do
it
by
experimenting
with
same
various
interword spaces until the correct width is achieved.
last
advantage
one of order:
(and
are
of the linecaster over hand-composition is
blocks of type are easier to keep
all
the
same
width)
individual pieces of type.
"lost"
The
than
in
order
lines comprised of
The individual pieces are
also
for the period of time they are used for the actual
printing.
The Linotype has four divisions.
magazine
character
Each
of
the
matrices
is
the
two
contains
a
standard
and an ltallic of that character (usually).
magazines are interchangeable and
faces.
first
that contain the matrices (a magazine corresponds
to a font).
have
The
thus
the
The
operator
may
magazines -- one for roman and one for bold type
Then thate is the keyboard which releases into
"channel"
one
matrix
per
corresponding
the
keystroke.
As
mentioned above, there is a casting mechanism which injects
the
lead,
and
finally
recirculates
each
utilizing
special
a
the distribution mechanism.
used
matrix
sort
into
the
magazine
the
originates:
terminology
"upper
must
and
occurs,
of
the
and
"lower
~apital
Linotype
is
case"
letters!
is
if
a
be made on a line, the whole line must be
recast and possibly the
widow
case"
the upper case contained the
One of the disadvantages
correction
by
key on the top of each matrix.
Some linecasters may be fitted with two magazines,
where
This
the
surrounding
whole
page
is
lines.
sometimes
Also,
if
a
reset to a
49
Obviously, pages are "pulled" throughout
tighter set size.
the
operation
analyze
to
for
errors.
With composition
software for second- and third-generation typesetters,
the
detection of a widow is detected before any copy is pulled.
Other more obvious
drawbacks
are
that
matrices
may
be
mixed-up (they are sometimes dropped into the assembly area
of the machine out of order-- a "feature" of the machine.)
The Monotype was introduced a year after the
(1887).
This
Linotype.
It is divided into two main
keyboard
and
machine
the
differs
caster.
Linotype
substantially
The
from
constituents:
blowing
air
against
the tape.
In the
e~ent
through a hole, it blows against a small piston,
turn
actuates the setting mechanism.
a
mechanism
for
determining
justification range.
counter
which
which
the
Because the keyboard
number
sum
of
the
There was
of
These counters were
units
used
a
mechanical
widths
also
a
of
the
counter
tape
to
calculate
the
required for justification, which number
was punched at the end of the line.
backwards
to
access
the
The caster would
read
amount of space that
needed to be added to each interword space as the line
cast.
be
tallied the number of interword spaces that appeared
on the line.
the
by
provided
characters as they were keyed.
which
in
whether the line was within
This was
counted
hole
air gets
is physically separate from the caster, there needed to
a
the
two communicate by paper
tape, the caster senses the presence or absence of
by
the
was
Justification is achieved by adding discrete amounts
of space between words, as opposed to a Linotype.
50
In order to tabulate all these runounts,
there
needed
to be a counting system to relate all the character widths.
A relative system was developed wherein there were so
"units"
to
the
"em"
(which
is, as may be recalled, the
widest character in a font).
chose
many
The
inventors
of
Monotype
to implement a system in which an "em" was 18 units.
Thus, an "en" space would
system
will
work
for
be
any
9
units.
width
"em".
for
system
that
this
point size, as characters are
measured only in relation to their
This
Note
caused
type
designed in specific quanta (in
relative
to
an
the monotype to be
relation
to
each
other,
rather than randomly as for the Linotype).
The casting component contains a removeable matrix (in
the
mathematical
sense
this
time) in that it contains a
number of "combs" locked side by side, where a
row
of
character.s.
the
machine at one time.
of the same set size, and
relative
units
customized.
matrix
to
a
18,
"normal"
range
although
to
a
wedge.
essentially a "hardwired" table.
wires
are
available
is
usually
this
given
The
can
be
from
5
easily
configuration
of
normal
wedge
is
are
no
Although
there
involved, given notches on the wedge signify a given
set size.
Thus the wedge describes
combs
the
in
matrix.
the
configuration
relative
units
of
If the user decides to reconfigure
his matrix by making one of the rows contain characters
9
a
Each row contains characters
the
Corresponding
is
is
The combs contain 15 charactes each,
and there are 15 combs, thus 225 characters
on
comb
rather
of
than say, 12, another comb that
51
reflects this data
Another
must
be
inserted
into
the
machine.
correlation that must be dealt with is that of the
code punched onto the tape by the keyboarder and the actual
piece
of
type that appears in that position in the matrix
case (a modern equivalent would be changing type "balls" on
an IBM selectric typewriter).
The actual casting process proceeds in
manner.
the
following
The caster reads the tape which causes the matrix
case to be moved in a manner such that
appears over a "casting" hole.
the
proper
At thus point a square tube
is fitted to the selected matrix and molten lead is
into
the
tube
and
impression of the type
against
to
be
repeated until the line is set.
matrix
the
forced
matrix face, causing an
formed.
This
process
is
TTS
Perhaps
invention
the
of
most
these
development
important
occurred
typesetters
after
in the 1930's.
Walter Morey invented the Teletypesetting code.
It
system
(to
for
sending
typesetting
typesetters) over telephone lines.
quickly
as
would
information
to
be
was
It didn't catch
be expected for two reasons.
newspapers didn't want
locked
the
into
a
a
drive
on
as
First the
particular
editorial content, and second, the local typesetting unions
did not want to lose the local keyboarding
~ork.
However,
by 1951 UPI began sending pre-justified anyway because some
newspapers by that time were using TTS internally to
their
Linotypes.
This
implies,
of
course,
that
the
to
the
matrices in Linotypes had to have been standarized
32-unit-to-the-em
This
was
done
estimating
rounding to
with
the
system
not
a
by
(TTS
instituted more accuracy).
redesigning
micrometer
nearest
drive
the
the
1/32
of
type,
but
by
value of a matrix and
an
em.
This
caused
rounding errors that usually balanced out by the end of the
line, but not always.
The tapes were punched on a
that
typewriter
resembled
perforator".
a
In the very early
called
days,
a
Linotype
machine
"multiface
keyboards
were fitted with punches connected to solenoids, so that as
the
paper
tape
automatically.
was
read,
the
keys
were
punched
Later, of course, linecasters accepted tape
directly.
52
53
TTS is a descendent of both the Baudot
Murray code.
and
the
The Baudot scheme allows the user to strike a
chord on a 5-key keyboard.
providing
code
32
distinct
used a keyboard
that
The cord is
fed
a
serially
The Murray code
representations.
resembled
out
typewriter.
It
also
produced a 5-level code, but in parallel, much in the style
of the
modern
alphabet,
paper
It
tape.
used
26
codes
as
the
and two for line feed and carriage-return codes,
respectively.
There was also a shift code that acted as
a
When in "shifted" mode, the codes became figures.
toggle.
There were no lower-case characters.
Morey
adapted
this
system to a 6-level code, thus adding 32 codes, and used it
to
drive
the
information
typesetters
theory
described
(wittingly
or
above.
not)
used
He
and assigned the
characters used most frequently to the representations with
the
least holes.
This minimized tape wear;
one-hole representation.
conjunction
lower case
The
addition
a space had a
of
32
codes
in
with a shift-precedence code allowed upper and
characters.
In
addition,
he
assigned
such
typesetting commands as:
thin space
quad left
upper rail
lower rail
em space
rub out
quad center
Upper rail would
matrix,
which
select
the
upper
portion
of
the
would have the roman font as opposed to the
lower portion which would contain the itallic.
' .
The Advent of Photocomposition
The
devices
generation
described
typesetters".
herein
The
are
termed
reason is that while these
machines accomplish composition by exposing a
medium
through
a
"master"
"first
character
photographic
image, they do not
substantially differ from their hot metal progenitors.
instance,
time.
they
are
capable of setting only one line at a
Then the paper is moved and another line
An example would be the Intertype Fotosetter.
same manner as a Linotype.
piece
that character on
front
of
a
of
it~
by
a
The
much
the
The
matrix
photographic film with the image of
Thus
the
light source, then
mechanism
in
keyboard.
matrix
is
suspended
light is strobed.
'~e·
result is a character is "written" onto the
spot..
"set".
Each character is on a separate
matrix which is "called up"
a
is
This machine
has a system of machinery that uses matrices
contains
For
film
in
The
at
that
that contains the light source then
moves a distance determined by the "width" of the character
(encoded
in
the
matrix
by
virtue
matrix) and the process continues.
of
the width of the
At the end of the line,
the camera mechanism measures the height of the accumulated
matrices to
determine
justification.
All
the
amount
justification
of
on
space
needed
this
machine
effected by "letterspacing" -- the padding out of the
by
adding
space
between
letters
in
the
Fotosetter provides the capability of various
(from
6
to
pre-focused
36
pts.)
lenses.
words.
point
for
is
line
The
sizes
by incorporating a system of eight
Otherwise,
54
the
Fotosetter
very
55
resembl~s
strongly
a
Linotype
(i.e.,
matrix
returning
mechanism, etc.).
Another new entry at the time was the
is
based
on
the
Monotype.
Monophoto.
In this case, the matrix-mat
(that contains the character images) contains
images
that
are
It
positioned
below
photographic
a lamp using the same
machinery a Monotype incorporates.
The character image
passed
then
through
a
few
lenses,
imaged
photographic medium by travelling mirrors.
mirror
that
determines
is
onto
the
Thus it is
the
the position on the film that the
character is put.
The AFT Justowriter is typesetter
f r om
an
ear 1 i e r
typesetter.
of
a
t e c h no 1 o g y ,
t hi s
that
Like the Monotype and Monophoto,
tape-punching
part
and
a
much like a typewriter.
phototypesetting unit.
a
evolved
t i me f r om a· s t r i k e-on
it
consists
tape-reading unit.
tape-puncher punched TTS type codes, while
is
also
the
The
reader
is
The typewriter part evolved into a
It is simpler in workings as
there
"standard" set value for the type, and only one size
of type.
source
The imaging is accomplished by
through
characters
shining
on a daisy wheel.
a
light
Incidently,
the wheel stops before the light is strobed.
The first second generation typesetter was the
200,
developed
characters
in the late 1940's.
through
photographic
Photon
The Photon 200 images
film
masters
that
are
arranged in concentric circles around a disc.
There are 16
fonts in the eight circles.
selected
Thus, a font
is
by
56
moving
the axis of the disc, then the character is strobed
onto the medium as it passes the light
remains
in
motion during this process.
"sized" by the use of
turret.
12
lenses
The
disc
The images may be
mounted
on
a
rotating
One of the innovations developed with this machine
is that of the "width card".
table
source.
for
character
This was essentially a lookup
widths,
and,
the
width
cards are
changed to reflect the characteristics of the fonts in use.
A competitor to the Photon
Linofilm.
200
the
Mergenthaler
This device also imaged characters by strob.fng a
light source through a photographic
case,
is
there
master,
are 18 grids of one font each.
mounted on a carousel and, when a font
and
but
in
this
The grids are
character
are
requested, the carousel rotates to the proper grid then all
the grid is mechanically masked out except the desired one.
The image may then be sized through a resident zoom lens.
Second Generation Principles
There are many similarities in the
of
For
typesetters.
negative
storing
images
the
on
font
will be selected.
instance,
all
photographic
second
generation
fonts are stored as
The
film.
manner
of
is determined by the manner in which it
Two steps are
involved
in
the
latter
process:
1) position the character in front of
source
the
2) mask out, somehow, all characters
one desired
except
Let's examine some
characters
of
the
attendant
light
the
pitfalls.
If
are exposed using shutters to mask out unwanted
characters, it is not uncommon for the shutters
either
to
stick or to clip off portions of the desired character.
In
the case where characters are selected by means
of
moving
the axis of a disc, vibrations are often induced, causing a
blur of the image.
characters
on
This
the
was
avoided
sending
"null"
tape following a font change, but this
is, of course, an undesireable
objections
by
feature.
Then
there
are
to characters that are exposed "on the fly" (as
with the Photon 200).
The
claim
here
is
that
a
faint
for
"sizing"
"tail" can be seen following the character.
There
are
almost
as
many
schemes
characters as there are typesetters.
Sizing is the task of
reducing or enlarging a character designed
certain
point
size.
The
utility
is
for
font.
Some
as
a
that one need not
maintain up to seventy different sizes of a
each
use
character
for
machines do character sizing by using a
57
58
separate image size for
fixed-size
each
font
used,
and
so
use
lens (as with some Compugraphic equipment).
this case, only certain sizes are desired, perhaps
three.
good
a
In
two
or
The resulting image is excellent, but this type is
mainly
drawback
for
of
"straight"
some
of
text,
these
i.e.,
machines
change the font strips without
galleys.
One
is that one cannot
exposing
the
photographic
medium.
Some typesetters have two lenses that may be switched,
but they must be changed manually by either unscrewing them
or revolving a turret.
phototypesetters
Aside from these examples
that
can
devices use a zoom lens.
type
in
case.
the
Most
size
are
automatically.
It is implied that
one
These
can
machines
with
zooms
can
size
at
specific
To achieve better appearance, some devices
maintain fonts in a few size ranges, the lower two
6-12 pts.
caveat
(or
another
good
on
the same line:
turrets
to be on different lines!
of
resolution
lenses
since
in
some
they
are
not
as
Thus the
as
to
simulate
adjacent
chacters
Of course there are also
devices
that
fixed-length
offer
lenses.
sometimes include special lenses for distorting the
so
An
as the characters are sized
their apparent base lines do not match.
appear
appearance.
one, I should say) is that
many machines can offer this sizing, but
characters
ranging
and 12-24 pts., respectively, so that they
can be sized within that range with
important
set
range of 6-72 points, but this is rarely the
discrete sizes.
from
the
better
They
images
italic, and others that "squeeze" the
59
characters into a different set size.
The issue
varied
of
character
treatment.
There
registration
is
a
number
First, the photograpic medium may
implying
spot.
that
the
image
Or, the image
advances
vertically
be
lens
moved
may
move
while
at the end-of-line.
A variation would
mirror
that
sweeps
the
medium as it
moves.
A
bundle
to
the
film
Then again, both
In this
case
a
up the image and channels it to a
moving lens.
fiber-optic
horizontally,
is always projected to the same
assembly
picks
received
of alternatives.
the image and film may remain stationary.
collimnator
also
page,
be
to
laying
different
convey
have
prism
or
the images on the
method
the
a
is
image
to
use
a
to the spot of
exposure.
Character escapement -- the providing of space between
successive
words
or characters, is the last major concern
of second generation typesetters.
calculating
escapement
afterwards.
the
before
escapement
this
sizing
must
be
larger
concerns,
the
If the sizing is done before the
width we were to move, since the
In
There are two
image
registration
or smaller than the true
character
is
distorted.
case, the magnification factor must be taken into
consideration in determining the distance to move.
sizing
or
is
done
after
the
the escapement, the problem is not
quite so bad since the escapement
magnified
If
of
along with the character.
the
true
width
is
The escapement values
must all be translated from the relative measurement system
of
ems
and
ens
to
the absolute machine units (internal
60
units), whether they are in parts of a meter or parts of
point.
Thus
if
we
had a 9 point font and needed to lay
down three units of space, and the machine moved in
of
a
point,
and
the
system
100ths
is 18 units to the em, the
following calculation would be performed:
9 pts
----- *
1 em
1 em
18 units
3 units
*
a
100 internal units
* ----------------point
=150 IU.
Third Generation Typesetters
The main distinction
generation
of
photographic
character
between
typesetter
medium
master
not
is
that
directly
the
third
and
second
the third exposes the
from
a
photographic
but from the face of a cathode-ray tube.
Some other distinctions are:
1.
There are no mechanical parts involved in
selection of the character.
2.
The device can manipulate the
characters
electronically (for sizing and skewing).
3.
In most of the devices, there is no need for
font masters, the fonts are stored digitally
on disc files.
4.
They are almost always driven by a computer.
5.
Some can compose full pages without moving the
paper.
The Linotron 505 may be thought of a "2.5
the
generation"
typesetter in that it scans photographic masters within the
machine.
with
The masters are arranged in a grid of 16
16 characters per plaque.
plaques
The character selection is
accomplished by producing scan lines in a relatively
patch
of space on a crt.
small
These lines are imaged through a
grid of sixteen lenses (all in the same plane), through one
character
position
out
of
each
selecting 16 characters out of 256;
of the 16 plaques (thus
actually,
238)
then
through 16 more lenses, (again, all in the same plane) onto
16 photo-multiplier tubes.
the
16
by
One character is selected
from
accepting input from only one of the 16 PMT's.
The impulses from the pmt are fed to a second CRT that sits
on
a
travelling
bed.
The automation of the scanning is
61
62
effected by the carriage sensing
per
inch);
with
calibration
lines
the sensing of each calibration line, a
new scan is begun, thus the characters are laid
vertical
strokes.
characters
There
each;
windmill-looking
the
can
be
grids
wheel.
four
are
This
by
causing
the
down
grids
mounted
with
of
238
on
a
device skews characters by
skewing the scan lines on the ouput CRT.
accomplished
(1300
output
Also,
CRT
sizing
is
to amplify the
stroke lengths by the desired magnification size.
The
technique
photographic
of
creating
images
masters has some limitations.
by
scanning
The images are
not as "clean" as they could be because in the scanning
a
of
straight line, the line is detected randomly (the signal
o s c i 1 1 at e s ) .
not
A1 so , the "p i n c us h i on e f f e c t " of the
adequately
compensated for.
CRT
is
This is handled, to some
extent, by locating characters that distortion that
has
a
minimal effect on (as some punctuation) at the edges of the
grids.
Another drawback is
uniformly,
nor
that
the
do
P~AT's
not
age
do they stay calibrated uniformly, causing
some characters to register brighter than others.
The
Videosetter
character
is
one
is
another
machine
grid within the typesetter.
character
characters
are
grid
of
projected
dissector
tube"
(like
character
selection.
a
106
onto
that
television)
face
of
which
the
All
the
the "Image
does
the
It too uses a second CRT for output,
but this CRT has a fiber optics bundle fused to
Since
a
In this case, there
characters.
the
scans
its
face.
fiber optics bundle comes into contact with the
63
photographic mediwm, there is no sizing done
This
values for the characters
directly off the character grid.
Next to each character is
bar
reads
the
lenses.
width
a
machine
with
code encoding 4 bits.
A constant, "3", is added to
each value obtained, thus giving the capability to encode a
range
of
3
to
18.
In later versions of this machine, 8
grids were added in turrets to allow larger "on-line"
font
capability.
Digital phototypesetters
differ
from
the
preceding
devices in that the character masters are scanned only once
(and usually on a different device.) The character data
then
stored
lengths.
the
in
memory
as "stroking information", or run
Characters are written one at a time
beam
on
is
by
turning
and off for limited duration defined by "on"
and "off" edges of the characters.
Characters are usually digitized in three or four size
groups
at
450
to
900
sizing is accomplished
widening
them
strokes/inch.
by
lengthening
The reason is that
strokes
and
also
by increasing the voltage to the CRT.
Thus
if a character that was digitized at 720 lines per inch
enlarged
inch.
be
50%,
it reduces the number of strokes to 540 per
Even with thicker lines, character
pushed
too
is
far
larger point sizes.
as
the
stroking
cannot
roughness can be detected at
We cannot start with an assumed
large
point size and scale down, as this will cause the character
to appear too dense and smudged.
enlarge
an
effectively.
image
by
Thus it
40-60%
and
is
reduce
possible
it
to
20-40%
64
The early versions of the RCA Videocomp
fonts
to
be
used
to
tape, then intB core.
core
and
a
required
all
be loaded at the front of the data
If there
character
of
weren't
another
enough
fonts
in
font were required, a
complete font was required to be overlayed from disc.
This
shortcoming was overcome in the 830.
Autologic developed machines that set the type as
film
advanced
past
a
speed up the process.
sizing,
so
every
These
"window" in an attempt to
machines
had
no
electronic
font was digitized separately.
not as bad as it seems
character
setting
stroking
as
data
the
in
company
a
the
also
compressed
This is
stored
format,
the
thus
packing more characters into memory.
A totally different method of producing characters was
introduced
in
the
fixed-head drum (it
however,
weren't
patches of dots.
drum,
it
Fototronic
CRT.
held
chars).
stored
1024
as
characters,
As the information was read in
from
the
was used to direct the movements of the CRT beam
Thus
it
d i dn ' t
rna t t e r
the initiation of the read took place as long as the
cycle lasted only one drum revolution.
size
The
stroking information, but as
i n "spray i n g" t he char act e r out .
where
It stored fonts on a
This machine
could
electronically and produce forward or reverse leading
in 1/10 point increments (the utility of this
will
clear in the discussion of vertical justification).
become
65
Most
phototype
information
as
setters
receive
predefined
strokes
their
character
that
is,
"stroking" takes place off line by reading what are
"perimeter"
permiters
characters.
formed
typesetter
by
reads
Perimeter
co-ordinate
perimeter
that
pairs.
The
characters
from
Metroset
disc
the
a
density.
Another
they
are
This
"look
innovation
ahead"
algorithm
characters of fonts that are not in
characters
in.
This
is
done
by
largest point size and throwing away strokes
to narrow it down.
implement
and
all the characters, no matter what the point
size, have the same stroking
assuming
called
characters are simply
generates stroking patterns as the data is read
implies
the
may
be
needed.
it
made
that
core,
so
was
to
anticipates
that
these
read-in in advance for use at the time
The
other
major
manufacturers
soon
adopted this technique.
One would think that newspapers would have been
to
use
phototypesetters,
but
they
needed
to
quick
await
technology for making plates from photographic film.
however,
they
are
they
can
include
half-tones
as
of the composition process, and they have better logo
versatility.
them
Now,
the largest users as they can have all
the type fonts on line;
part
a
to
take
The speed
of
classified
phototypesetters
advertisements
minute, thus creating more revenue.
up
also
allows
to the last
Composition
Comp/Set
500.
This
particular
typesetter
entry" style typesetter, meaning each line is
it
is typed in.
is a "direct
composed
as
Before beginning a sequence, the operator
must indicate various control parameters:
1.
point size
2•
f on t numb e r
3.
line length (in picas)
4.
leading
5.
manual or auto end-of-line decisions
6.
minimum interword space
7.
maximum interword space (expressed in relative
units)
In manual mode operation,
decides
to end the line.
In
automatic
user
types
until
he
The machine aids in the decision
by "beeping" when the operator
range.
the
has
entered
justification
mode, the operator continues typing
and the machine ends the lines automatically and pushes any
word
The
that
doesn't
drawback
of
fit to the beginning of the next line.
such
a
machine
is
that
i t
cannot
redistribute lines that have already been set (for vertical
justification).
to
move
The vertical justification may be
desired
a line from the bottom of the current page to the
top of the next.
66
67
More sophisticated processing can be
using
a
"front end" computer.
accomplished
There are two contexts for
this application -- batch and interactive.
system
by
The interactive
is similar to the Comp/Set machine described above,
except the system writes a file for
driving
rather than setting text immediately.
perform
sophisticated
dictionary.
hyphenation
a
typesetter
This system can also
using
an
exception
In addition, the user has the option to reset
the entire page using other parameters if he so desires or,
the
file
contexts.
"override"
desires.
that,
may be sent for typesetting and reused for other
The interactive system relies
hyphenation
offered
by
The mechanism is called
on
the
a
the
user
system
to
if he so
discretionary
hyphen
when inserted by the operator, takes precedence over
the system's hyphenation.
A batch system is a much larger software package
that
contains extensive logic for balancing lines on consecutive
pages and other considerations.
that
It
usually
files
have been marked up much as a RUNOFF file is, and the
imbedded commands direct the software as to
to
reads
take
for
the given context.
was discussed in class;
brackets.
Thus
its commands appear within
tf,l;ps,36
drives
a
encoded
particular
into
the
typesetter.
phototypesetter driven by 8-bit
square
instructs the program to set
names) and a point size of 36.
the program are
actions
A language called PAGE-3
the type face to "1" (which is an index
font
which
into
a
table
of
The commands given to
binary
Each
codes
language
font
reserves
used
seven
that
by a
to
68
nine
of
the codes as meta characters for providing escape
sequences from the normal text.
The Comp/Set
capability
to
typesetter
set
characters
sizes on the same
Videocomp
mentioned
line.
of
above
lacks
the
widely differing point
However,
for
machines
as
the
500, this capability must be provided for in the
composition software that outputs code
for
that
machine.
Therefore, the software scans the next line in its entirety
looking for the character with the
This
value
then
advance the CRT
becomes
bea~
the
largest
body
leading.
runount the typesetter will
when it begins writing the new line.
Getting back to justification, a
compositiori
package
can take one of four main tacks in its end-of-line decision
processing.
buffer
In all cases, words are set
until
negative.
throw
the
counter
into
the
output
of space left on the line goes
The first tack in dealing with the
line
is
to
away
that last word and attempt to justify the line
without it.
The second tack must be considered if the line
would
look to spaced-out without the last word:
the remaining
spacing
words
also.
A
on
the
third
line
using
alternative
is
space out
inter-character
to
hyphenate.
Otherwise the program should go back to previous lines
attempt
to
reset
them tighter or looser in order to make
the current line set correctly.
this
is
to
keep
and
at
least
resetting at any one time.
The
three
way
pages
Page-3
handles
available
for
69
The specification of text position on a page is also a
capability
of
batch systems.
into "blocks".
provides
A block
the
block must
bounds
be
language,
is
for
defined,
before
any
All typeset material is set
an
the
using
type
imaginary
rectangle
that
text as it is set, and the
the
commands
can be set.
within
the
There may be any
number of blocks per page, and the blocks may be positioned
anywhere
on
language.
many
the
substitute
stream.
the
page,
also
with
specifiers
within
the
Blocks may be named and used multiple times on a
page . or
allow
a
user
pages.
to
Most batch compositon systems also
define
typesetting
macros
commands
that,
and/or
Added to this is the ability to
variables
controlling
conditionally execute
statements.
language really is a language.
the
Thus
when
text
test
called,
into
various
composition,
the
the
of
and
composition
Hyphenation
Mention was made above
routine
hyphenation
justification
quality
is
routine.
incorporates
of
hyphenation
actually
Most
extension
composition
A
of
the
aesthetic
of
hyphenation, but the degree to which
one relies upon it is dependent on
newspapers
instance,
an
routines.
hyphenate
the
application.
the
most
often,
however,
high-quality books try to minimize it;
For
while
a
total
lack of hyphenation can be as disasterous as too much of it
the result is a page with "rivers" of white space in it.
We know that when we hyphenate we do so on the last word of
the line.
For this
nature of a word.
purpose,
side
hyphen
on
must
define
the
exact
A first approximation would be to define
it as any string nf non-blank
either
we
of
This
it.
either
side
characters
may
of
with
spaces
on
be extended to include a
i t
the
also
"micro-computer" may be divided at the hyphen.
word
We can also
add the em-dash and sometimes the slash.
Now that we have our word, we find we must
"clean
it
up" even more by stripping the punctuation from either side
of it (remember, this is
hyphenation
point).
for
purposes
of
discovering
Another class of characters that must
be removed from the word would be those of the
language
that
may
the user may wish
"forever".
for(fn,2]ever[fn,l]
(itallic,
say)
be imbedded in the word.
to
The
a
hyphenate
mark-up
the
commands
to specify that we
composition
For example,
sub-word
mny
switch
before setting the word "ever".
70
"ever"
appear
to
font
in
as
2
There are
71
some exceptions, but they must be handled as special cases,
for
instance,
with
6:45P.M., the P.M.
must appear with
the 6:45.
There are two main approaches to hyphenation:
and
by dictionary.
Often the two are blended.
logical
One of the
methods of the logical approach is to take a four-character
window
into
the
character
division can be made mid-way.
characters
that
will
and
will
printers).
to see whether a
·It starts with the last
two
fit on the line with the hyphen and
looks at the next two characters.
break
checks
This assumes
the
first
achieve the tightest fit (a preference of many
Thus if the word
that
is
to
be
divided
is
"photographer" and the characters "photogr" will fit with a
hyphen on the line, we analyze the
questions
are asked:
tetrad
"grap".
Three
1) can a syllable begin with ap?
can a syllable end with "gr"?
and 3) may an
be hyphenated between an "r" and an "a"?
English
word
The answer to the
second question would be no, so the algorithm moves to
left
in
2)
the
the word, character by character, posing the same
three questions until the answer to all three is "yes".
Some hyphenation routines start at the first character
of the word and work to the right, searching for a possible
breaking point.
This insures that if
we
take
the
first
breaking point, we find it will either set or it won't, but
we needn't keep trying as any other attempts
fa i 1.
clearly
will
72
Another
method
analyzes
For
pairs.
character
instance, a syllable will never end with the characters be,
bd, bg, bh •.•
be,
bf
and a syllable will never
with
bb,
Thus for each letter in the alphabet, we keep
all the illegal occurrancesy
as
begin
checking
a
This, however, is only useful
mechanism
for
some
general
hyphenation routine, as it tells us only what
Hyphenation
is
more
either side
of
a
basilic
both
purpose
won't
work.
challenging when there is a vowel on
consonant.
hyphenate
For
in
instance,
different
basil
and
fashions.
One
therefore has a 50% chance of being right.
A
less
storage-intensive
structure analysis.
suffixes
around
hyphenate.
is
to
use
word
We merely store a list of prefixes and
which
Then
method
words
easily hyphenated.
it
is
as
usually
acceptable
to
reentry, restructure, etc are
Dis-, in-, mis-,
and
un-
are
common
prefixes,
~hile
suffixes.
One still runs into trouble with such exceptions
as
inchworm,
-er, -tion, -sion, -ible, -ness are common
inkless,
union,
and unified.
resolved with more logic, but one must
the
system
logic.
is
There
to
be
somewhere
if
overburdened with hyphenation
is
also
the
words
that
hyphenate
connotations.
Examples
homographs,
different
not
stop
These may be
unresolvable
problem
differently
would
be
of
for their
minute
and
produce.
The logic above may be augmented with a dictionary
which
there
exceptions.
are
two
types,
full
dictionaries
Thus a word may be hyphenated using the
of
and
above
73
logic
then
that
such a dictionary.
syllable
If
the
or word may be checked against
word
appears
as
the
logic
hyphenated it, it is wrong and should be tried over.
It is
not unreasonable, however to maintain a full dictionary
line.
Such a dictionary may be searched in one disc access
and have a very good chance of having the
case,
there
would
be
no
need
by
keying
in
In
accepted
that
As mentioned
hyphenation
a discretionary hyphen.
done, the user's hyphenation is
analysis.
word.
for logic.
above, the user is able to override
lookup
on
logic
or
If this is
without
further
Input To A Typesetter
Because the
includes
the
"turnaround
on
many
typesetters
developing of the photographic medium, there
is a recognized
textual
time"
input
need
to
to
the
insure
the
composition
perfection
program.
of
One way to
minimize typo's is to write the equivalent of a TECO
to
the
macro
look for commonly misspelled words and correct them, an
example being "teh" for "the".
driver
file
that
contained
Such a program could read a
all
the
search and replace
options.
This is even more cogent when the input has
produced
with
an OCR machine.
The more sophisticated OCR
machines can read pages of material consisting of
various
These
fonts,
and
machines
recognizing
some
can
sometimes
an
"1"
as
been
type
even read hanwritten data.
make
"comnon"
a "1".
mistakes
as
Thus one could search for
words that have imbedded numbers (assuming the material
not
is
supposed to have that sort of construct) and fix them.
Other "format" errors
machine
can
also
be
found,
as
when
the
has inserted a space between the last character of
a word and the period following the word.
is
of
believed
dictionary
to
be
program
"suspicious"
correct,
that
it
may
underlines
When
be
input
submitted to a
words
i.e., that are not recognized.
then be scrutinized by a human for final
the
that
look
The words may
judgement.
This
added care saves time and chemicals.
The correction of errors
commands
can
be
more
in
difficult.
transform the typesetter or
person
74
the
actual
typesetting
These errors serve to
entering
the
typeset
75
commands
into
the commands.
typesetter
to
a
programmer of sorts, since he must debug
Often an incorrect mark-up command causes
abort execution in unfathomable ways, i.e.,
with no meaningful diagnostics.
as
entering
picas.
a
The error can be as simple
the column width as ten points instead of ten
In such a case, an indention
of
2
ten-point
ems
would cause unresolvable problems.
Clearly, text processing is involved in two
disparate
stages,
the data exists as "text" in a computer
memory, but also as ink on a
wherein
the
text
seemingly
is
page.
If
an
error
occurs
positioned on the wrong part of the
page b u t i s o t he r wi s e cor r e c t , i t i s mo r e p r u den t t o " pas t e
up" the output or to change the input to the typesetter and
run it again?
Both methods are employed.
In the
instance
in which we re-run, it may be because the file will be used
again to produce the same typeset output (as when the
generates
a
telephone
book).
The other method would be
used when the job is a one-time-only job or if it
in
file
is
done
a small shop that wants to spare the expense of running
the job again.
•.
Pagination
In computer typesetting, pagination is the process
of
outputting codes which cause a typesetting unit to generate
fully composed pages.
in
fact,
many
of
Looking back, one of
addressed
text.
are
This stage of the industry
the
problems
the
first
new;
are not fully resolved.
(and
simplest)
problems
by the industry was that of setting multi-column
Assume we want to set a page of three
60
is
lines
deep
each.
columns
that
We should also assume that the
leading and point-size of the lines don't change too.
The
procedure would then be to read the input file (randomly if
possible) and write the 1st, the 61st and the
across
the top line of the output page.
lines
Then the paper is
advanced within the typesetter and the 2nd,
the
121st
the
62nd
and
122nd lines of the file are written as the second line
of the
arcane
page~
a
The reason
fashion
this
must
be
performed
in
so
is that the typesetter could write only
one horizontal line at a time, with no backing-up possible.
This
scheme worked ok as long as the leading was constant.
Now suppose the first and third columns are set as 10 on 12
text
(i.e.,
10
point letters on 12 point bodies) and the
second column is set as 8 on
write
the
9.
The
program
must
then
first line, then drop down nine points to write
line two of column two.
Afterwards, it drops down
another
three points to write line two of columns one and three and
so on.
76
77
Another problem that had to be dealt with was that
the
serial
nature of typesetting commands.
if a column of text is to be set,
size
the
of
For instance,
leading
and
point
and font number are all specified, then all the words
in that column are set with those parameters.
When
column
two is begun, the appropriate commands precede it and these
commands remain in force for the duration of the setting of
the
column.
to
set
Now, if the algorithm specified above is used
t h r e. e
corresponding
co l umn s ,
to
the
t he
types e t t i n g
individual
command s
columns will be "lost".
Therefore, the program must discover and save the state
of
each column for future use.
Of course there is a way around all this computing and
that
is
to
use
a
typesetter
that
can
move the paper
backwards or a typesetter that can write a whole page at
time.
These
exist
now, but did not exist when the above
problem was first addressed.
discussion
of
a
"blocks"
for
As
mentioned
above
in
the
composing, the blocks may be
specified to sit anywhere on the page,
and
therefore
the
page may be "made up" in that fashion.
Perhaps the one driving force in page
the
widow.
A
widow
a
widow
cannot
the
page.
Now the rule
being
Since a widow
cannot
begin
programs must deal with it in one of three ways.
-~---
able
to
The reason for this is lost in the oral
annals of printing.
------
is
begin a page, and this is sometimes
extended to include two or three lines not
begin
is
is a line that is quad-left that is
immediately preceded by a justified line.
that
composition
-------
-
--~----------
-----
----
--
-------------
-
a
page,
One is to
---------
~
78
go back a page or two and reset the text either tighter
looser
so
as
to
either
pull
the
widow
back onto the
preceding page or to push an additional line
the
page
with
the
widow,
thus
making
or
it
two
vertical
justification
by
page
"carding".
the
of
lead
lines of print so as to increase the leading.
This, of course, pushes more lines onto the
The
and
The term
comes from the printer's term for inserting cards
between
onto
acceptable.
Another method would be to go back to a previous
perform
or
widow's
page.
other alternative resembles the previous one, and that
is concerned with illustrations or photographs.
amount
of
"white"
space
page, and this space may
satisfy
a
widow
surrounds
be
The
certain
an illustration on a
tampered
condition.
A
with
in
order
to
ability to do vertical
justification implies the typesetter can advance
the
page
in increments at least as small as one tenth of a point.
A
similar printer's condition, that facing pages should be of
equal depth, is handled in much the same way.
There
are
pagination.
therefore
One
few
different
ways
to
do
mentioned already is in the Page-3 style
of specifying positions
program.
a
of
blocks
from
within
a
batch
Another would be to arrange blocks interact.ively
with a "page makeup terminal" in which the operator has the
ability
whole
to move the blocks around at will, then to see the
page
before
it
is
set.
Other
special-purpose
terminals are used in newspaper advertising offices.
These
allow tne user to type in text then to enlarge or reduce it
using
special
keys.
The
characters
are "stick figure"
79
characters that are meant to give a
proportions
of
one
idea
of
the
type to another and not as the actual
representation of the type.
capability
general
These terminals also have
the
to define irregular shapes around or into which
text may be "poured".
This
completes
composition.
the
treatement
of
printing
and
The remainder of the semester was devoted to
an introduction to the technology of
office automation.
word
processing
and
The treatement of these topics from the
point of view of their impact on the
beyond the scope of this course.
office
personnel
is
Word Processing 1
The trend of what we now call
with
the
introduction
and
speed,
normal manner.
1930's
with
efficient
where
a
p 1 aye r - p i an o )
changed
to
capability
and
1at e r ,
to
merge
terms
an
both
in
the
of the Friden Justowriter.
t he
These
text
of
advance
roller-paper
in
paper-tape.
The "trend" is
in
made
introduction
This device punched and read
began
"scribe" is defined in the
Word processing
the
processing
of the typewriter.
that of making scribes more
accuracy
word
(similar
a
19 5 0 ' s , t he me d i urn was
devices
from
to
provided
the
various paper tapes, and
could also perform a rudimentary text search.
The term "word processing" was finally coined
1964
in
when
it
Typewriter (MT/ST).
storing
a
introduced
improvements
Mag
Tape
This device uses tape
textual information.
radical
the
change
from
in
IBM
Selectric
cartridges
for
While the technology was not
the
storage
by
Justowriter,
density
chars/inch vs 11 chars/inch with
(it
paper
it
could
tape),
offered
store
the
20
tapes
were recyclable, and the transport machinery (for the tape)
was faster.
Typewriter
In 1969 IBM introduced the Mag Card
(MC/ST).
Selectric
Characters were stored on a magnetic
card that could store a minimum of one page of text.
the
introduction
Until
of the MC/ST, IBM had little competition
IM~;~--~f-~h~-~aterial on word processing derives from Data
P~o
Dafa
Re~orts
on Word Processin[,
PfojResearch-Corp7;-r~r~T7
' 80
(Delran, New Jersey: -Tne
81
in the market, but since
then,
over
100
companies
have
entered the word-processing arena.
In
word
schemes
for
processor
providing
technology
the
there
are
two
main
function of a word processor:
standalone units and shared logic.
Within the standalone genre,
The
first
is
the
there
and
control
logic,
medium recorder
is
used
(with
internal
the
same platen.
within
types.
It is a
coupled
with
memory and a magnetic
The platen
paper) to generate the text and to perform
to over $10,000.
type
is
card, cassette, or diskette.
subsequent editing functions.
on
four
standalone mechanical system.
keyboard and platen wherein the keyboard
edit
are
The final document is output
These devices are priced from $5,000
Standalone display
this
genre.
In
systems
this
type
are
another
of system, the
editing and input are registered on a display screen
while
the output (i.e., the final product) goes to a daisey wheel
printer or a selectric printer.
more
internal
memory
than
These systems usually have
that
required for one page's
worth of text, so the user may "scroll" throughout
long
page,
pages.
and
Some of
provide
special
even
these
divide
offer
systems
are
user-programmable
is
system.
more
to
functions, and of these, some are capable
a
limited
normally
These devices
data-processing facility, say
tallying a column that is flagged
this
very
this long page into smaller
of being down-loaded from a magnetic medium.
sometimes
a
found
for
on
this
purpose,
but
a shared-logic type of
These systems usually price around $10,000 to over
82
$25,000.
The
third
category is really a sub-category of
the previous type of system.
"thin
window"
display
It
is
system.
called
This
idea
is
that
particular line in
category
in
the
user
memory.
can
It
low
see
offers
gas-plasma).
all
or part of a
from
the
first
that it saves paper, and has a lower purchase
end
The fourth
type
of word processing technology.
covers
It is called
the electronic typewriter and is usually an upgrade from
normal
a
or
differs
price than a full display system.
the
standalone
device
partial- or full-line display (either CRT
The
a
typewriter.
It
has
a
internal memory, but differs
from most word processors in that there is no provision for
secondary storage.
Many of the above
This
the
case
diskettes
copying
lists.
text,
to
systems
are
dual-media
stations.
in which it has provision for two tapes or
run
concurrently.
This
facilitates
both
and the merging of "boilerplate" text with address
A
being
~boilerplate"
a
section
such as for a form letter.
offer a communications
of
non-varying
Some of the systems also
interface
and
the
capability
to
similar
to
drive a higher-speed printer.
Shared
logic
time-sharing
in
word
that
number of stations,
processing
and
processors
one minicomputer is used to drive a
but
document
the
CPU
stream
is
processing.
contrasts to word processing in
"word
are
generation"
that
in
dedicated
word
Document processing
word
nature
processing includes software resembling
to
processing
while
RUNOFF,
is
document
automatic
83
abstracting
software,
and
a
table
description language
{for, perhaps, driving a phototypesetter).
"data
processing interface".
to store data as both
enables
"DP"
to
integers
search
and
text
ASCII.
This
files for needed
information, as figures from a current annual report,
Typically,
up
to
12
stations
high-speed
may
line
computer-type
share
printers,
tape
common
OCR
drives,
photo- comp interfaces.
that
etc.
stations are supported by a central
minicomputer (but there exist some that drive
The
a
This provides the capability
binary
programs
Some include
up
to
peripherals
equipment,
network
35).
such
large
as
discs,
communications
This stems from the obvious
and
point
there is a non-trivial computer controlling them all.
Such systems share not only the logic but the cost as well.
Thus
of f i c e s
wi t h
high v o 1 ume rna y be cos t e f f e c t i v e :
costs are measured on a per/station
basis,
the
price
if
is
$10,000 to $20,000 (from an overall cost of $25,000 to over
$150,000).
the
CPU
Other advantages are non-trivial:
may be used to all searching of very large files,
maintaining
updates;
page
also,
format
the
and
have
right
logic systems there
hyphenation
routines
pagination
over
numerous
printing of documents may proceed as
other text is being input.
systems
the power of
Most standalone word processing
margin
are
justification,
frequently
fairly
that uie dictionaries.
but on shared
sophisticated
A standalone
system normally asks the operator to perform hyphenation.
Some advantages of word processors follow:
1.
Higher-quality output due to better equipment.
84
2.
Output is error-free.
3.
Higher utilization of installed machines
implies a "word processing" department).
4.
Reduced typing time due to the "rub-out" key.
5.
Faster output speeds (450
higher).
6.
Drastic reduction of "repeat" typing.
7.
No retyping at all
case).
if
words
for
per
minute
photocomp
(a
In terms of operational capabilities, the gap
word
processing
quickly.
(this
and
re-run
between
and text processing seems to be narrowing
In the following
list
enumerates
some
of
the
features of many word processors.
1.
Characters may be displayed as variable-width,
thus showing where lines will break.
2.
Underscored characters are underscored on the
screen or the start and stop control codes for
underscoring are visible.
3.
Multi-column display format available.
4.
Superscripted and
so.
5.
Ability to store often-used text for insertion.
6.
Displaying of such format-control
line-spacing, pitch, etc.
7.
Stored-forms processing where the
from
one· field to the next
carriage-return.
8.
Ability to perform queued-printing, where
CPU manages the print queues.
9.
Global search and replace.
subscripted
characters
appear
characters
as
operator moves
by typing the
central
10.
Automatic alignment of numbers.
11.
Automatic page numbering, headers and footers.
12.
Document merging.
Office Automation and Electronic Mail Systems
Office automation
concept
that
seeks
may
to
be
described
as
the
incorporate discrete technologies
into a tool for establishments (corporations,
departments)
and
institutions).
"glue"
groups
of
governmental
establishments
(research
Its advent is made possible by the
growing
capabilities of word processors and/or timesharing computer
systems.
However, word processors are only element
automated
office;
the
impossible without
networks.
Thus,
the
rest
of
experience
of
which
gained
an
would
be
with
computer
technically at least, computer networks,
with message software
the
realization
of
the
a~~
editors are the
system
relies.
basis
The
on
which
linking of local
intelligence in the form of either editor programs or
processors
is
realized
in the form of electronic message
systems (also called computer message systems).
not
enough
time
reorganization
(with
two
lectures
attributable
to
to
an,
less-paper-oriented office:
This
was
if
that
not
automation
There is time
has
made
paperless,
the electronic message
the
then
system
The harbiger of the ElVIS is, in a way, the ARPANET.
network
transmitting
is
network.
a
packet-switched
network
up to lK bits in one packet.
to stress at this
also
office
left to the semester).
enough, however, to describe the tool
(EMS).
There
to dwell upon the implications of office
staff
transformation
word
point
that
an
EMS
is
capable
of
It is important
not
simply
a
It provides for temporary storage, if needed, and
performs
long-term
(up
85
to
weeks)
monitoring
of
86
The
messages.
characteristics
of
EMS's
describing are not of any one particular
collage
~00 1
of
that I will be
EMS,
but
are
a
(a message system written at SRI), HERMES
(by BBN), the DMS 2
message
system
(MIT),
the
ETHER-NET
(Xerox), and one proposed at Information International.
An EMS is a
messages
are
computer.
hardware
In
and
somewhat
really
any
misleading
in
that
the
English text kept in the memory of a
case,
software
such
for
a
system
requires
both
coordination of messages (the
hardware being the network and file
used
term
system).
An
EMS
is
to send messages of any length to other people on the
"system".
within
System in this case means a network or
which
networks
a person or concern may be addressed.
EMS one can 1) send a file created by an editor;
In an
2)
type
the message directly to the EMS by running the EMS program.
There is an. underlying assumption that one needs a terminal
to
access
messages
addressed
to
him:
in an automated
office, there is normally one terminal associated with each
secretary
or
header of the
work
station.
following
form
A message always includes a
as
the
beginning
of
the
--------fst~;~t
Cracraft, MM User's Manual (Menlo Park,
Stanford Research InstTtute~-TnternatTonal, 1979).
2A. Vezza and M.S. Broos, An Electronic Message
System:
\~ere
Does It Fit? (Reprinf rrom-ProceedTngs-or
Ca.:
!~~-T!::~!!9.~-[[9:-~Q.E:II§:~!I2.!I[-!~!~:..
Q2.J]}Q.!!!.~!. ~~I~~fg[r:-
- --
87
message text:
(supplied by the system)
(supplied by the system)
(recipient list)
(carbon list)
(subject of message)
(lines of text)
Date:
From:
To:
cc:
Subject:
Text:
Messages are normally appended to a file in the user's file
storage
area.
duplicated;
which
In
some
systems,
the
a program maintains a file
pointers
to
recipients
are
messages
of
are not
"pointers"
maintained.
in
In this
system, when all the addressees have "seen" the message, it
is
deleted
from
the
system (unles otherwise specified).
Messages may be sent to distribution
individuals
(self
included).
lists
Also,
on
and/orspecific
some "in house"
(meaning not over an external net) EMS's, messages may have
an
expiration date;
after which the addressees who didn't
respond within a set time are made known to the sender.
such
a
system,
week
are
messages
automatically
In
within the system longer that a
returned.
"appended" by an addressee.
A message
may
be
If this occurs, the message is
marked as "unread" by all others and recirculated.
Messages are accessed through a program.
action
of
such
To make
the
a program more concrete, I give here some
sample commands from~~ (SRI)3:
The user may ask for the status
"mailbox" in a number of ways.
of
messages
in
his
88
1.
Since dd:
dd.
type titles of all messages since
date
2.
Subject strin~: give titles of all messages
"string" mentioned in the subject heading.
with
3.
New: give titles of all messages sent here
program started.
since
4.
Seen: give
"seen".
marked
as
5.
Text string: titles of messages with "string"
the text body.
in
6.
Unanswered:
7.
Deleted:
8.
From string:
titles
of
all
messages
titles of unanswered messages.
messages deleted in this session
on program termination).
(they
!~~llY ~!~deleted
titles of messages from "string".
Disposition commands follow:
1.
Read n: Read message number n (all messages
your box are numbered by the message system).
2.
Answer n: EMS responds with TO:
The proper response is ALL or sender.
EMS then accepts your message either
specification or as input.
as
3.
Copy n,filename:
"filename".
into
4.
Forward n,addressee:
addressee.
5.
Delete n:
6.
List n:
7.
Mark
Mark message n as seen.
8.
Next
Type the next message.
Copies
message
Forward
n
message
a
in
file
f i 1e
n
to
Delete message n at end of session.
Print message non lineprinter.
Now to send a message one types:
l\llM) send
To:
Soloman,McLure@SRI-KL
appears after the"@".)
(the
network
address
89
text, etc.
Why would anyone want electronic mail~
Broos 4 say the average business letter costs
Vezza
and
EMS
and
Writer's time
$1.45
Secretary's time
$1.76
Postage and paper, etc. $0.59
The postage and paper would be taken care of with
the
first
two would be cut substantially.
The same study
projects that the Post Office delivered 9 billion pieces of
mail
that
were writer-to-reader in nature.
The advent of
electronic mail to households would cut the costs
of
this
organization substantially.
Then there is a question of privacy -- for the message
originator.
In
business, the secretary of the originator
sees the message, as does the receiver's
secretary.
With
electronic mail, this is circumvented.
Ease of access is another
access
his
mail
carry his own
on
from
point.
An
executive
can
anywhere there is a terminal --or
business
trips.
This
allows
him
to
swiftly answer any pressing business at the "home office".
Other obvious attractions are electronic speed of
messages,
the
receiver
messages, and the computer
need
may
the
not be "present" to receive
be
composing and storing the message.
used
as
a
tool
for
93
to generate intelligent composition software.
fonts
and
Lectures
normal printing techniques should remain;
provide the
~easoning
for
other
composition
on
they
concerns
as
kerning and relative character widths.
The time saved could be employed in lectures
semantic
concerns
editors,
composition
languages)
and
as
general
base
and
general
publishing.
The
currently under development by companies
International,
Inc.
for
such
macro processing (i.e., in
languages
data
on
like
computer
latter
is
Information
use in composing text shared by
composing and general data processing programs.
The
text
would be described by its schema entry in the DBMS in terms
of its semantic meaning but also in terms of its part in
document,
as
a
"paragraph text".
under
"running
head",
"paragraph
As mentioned, the software is
head",
a
or
currently
development, but by the next offering of this class,
the details may be more formalized.
It is difficult to restrain the content of this course
so
as
not to attempt lengthly discussions of all branches
of Computer Science.
For instance, the
Office
Automation
material may be expanded (for the class's edification) to a
detailed
discussion
queueing
theory.
on typesetting
discussions,
of
may
page.
the
allow
not enough.
alone could fill a semester.
upon
networks
and
Once again, some restraint in lecturing
hardware
but
packed-switching
preceding
more
time
for
such
The Office Automation topic
A
sample
curriculum,
based
reflections, appears on the following
90
Professor Ed Fredkin at MIT 5 suggests additions
system
a
that supports EMS in the form of a calendar program
that reminds the user of pre-defined
on-line
to
databases
and
meetings,
light computing.
access
to
Obviously, these
are elements that could comprise the automated office.
--------gi~;;;~ Fredkin,
(Cambridge:
Artificial
unpublished).
The MIT Comerehensive Svstem,
-rnteTTTgence -tafioraTory~~~T,
Conclusion
When the
editor
This
as
class
part
stemmed
"v a 1u a b 1 e"
expressed
of
from
c 1 as s
shock
at
was
the
anxiety
t i me
that
to
on
my
spec i f y
of
me
part
it
and
capabilities.
embroiled
asking
spent.
suggesting
Members
in
p r e f e r en c e s .
necessity,
of
heated
a
required
proposals of others.
from
supply
fighting
sup r em a c y ,
method
display)
sometimes
each
the
became
for
their
g r o up ,
of
to offer counter examples to the
This process
one
(or
class
in
In
need to offer a
to
debate
using
The
commands
the
In f i g h t i n g for
was
proceedings
for
of
by c orrm i t tee •
certain class of capabilities, say print-out
commands,
the
the project, I was a bit discouraged.
reflection, however, it was time well
employed
specifying
of
transformed
the
class
lecture to one of debate.
From
these debates came ideas that none of us individually would
have
offered.
For
instance,
abstrace data type was born.
the
the
idea
of a Q-register
This essentially made each of
Q-registers a buffer, and there would be no difference
between editing the "buffer" and a
Q-register.
A
simple
context switch could provide the generality.
It is appropriate to mention here
ideas
some
of
the
developed in those discussions were not assigned due
to the size of the overall project;
that
that
but is was made
clear
students would receive "extra credit" if they were to
implement any extra features.
Some students
favorite
ability
feature
being
the
commands on an input line.
91
to
did
enter
so,
the
multiple
92
While the decision to require the class
the
editor
to
implement
project in PASCAL appears not to have hampered
the implementation, it did hamper the flow of the class
in
that
On
it brought about four or five unplanned lectures.
closer
examination,
discussion
of
handling.
in
the
however,
one
it
manufacturer's
allowed
a
detailed
attitude
towards text
The CDC Cyber 174 is computationally
I ight
of,
say,
Digital
fast,
but
Equipment software, its
implementation of text-handling tools is not adequate.
particulars
are
given
in
the
main text.
In discussing
features of PASCAL we were able to discuss the
the
language
characters
ability
to perform text processing tasks.
data type was found to be useful
and
thereby
in
creating
lessening
the
The
of
The "set"
classes
amount
of
of
code
generated.
I think it is a good idea to require the project to be
written in one particular language for the reason mentioned
earlier:
the discussion of implementation problems remains
homogeneous.
That there need be such discussion at all may
be questioned until
hadn't
sufficient
editor
w i t h out
insufficient
one
realizes
programming
s orne
many
of
experience
d i r e c t i on •
Th a t
is,
the
students
to develope an
there
was
knowledge of such standard data structures as
linked-lists.
In teaching this course again,
condensation
I
would
recommend
of much of the typesetting hardware lectures.
However, some knowledge of the hardware, even the
versions,
a
earliest
is necessary to appreciate the methods developed
94
A Sample Curriculum For Text Processing
Although the course lasts 15 weeks (45 meetings),
one
week is left unspecified for expansion.
1.
(6 meetings) Introduction
Introduction to "text processing" through use
TECO as an editor and progrrunming language.
2.
of
(6 meetings) Survey of editors
The class is to specify an editor.
This is
followed by d.iscussion of implementation details,
including the source language.
3.
(5 meetings) Pattern Matching
Details of simple pattern matching as well as
TECO's method. May also cover search-optimization
techniques and/or generalized search techniques.
4.
(6 meetings) Formatting Programs
Internal details of text formatting programs.
This
lays
groundwork
for
the "composition
programs" discussion later.
5.
(5 meetings) Typesetting
Includes a survey of typesetting devices (and
their capabilities) and a discussion of printing
in general (i.e., characteristics of type).
6.
(6 meetings) Composition Programs
A discussion of composition programs comparing
batch-oriented systems to interactive systems.
Should
include
discussions
on
hyphenation
techniques, page layout and datab~se publishing.
7.
(5 meetings) Word Processing and Office Automation
A discussion of standalone vs.
shared-logic WP
systems.
Office
Automation as a hybrid of
word-processing technology
and
networks.
A
discussion
of
such networks as ARPANET and
CHAOSNET. Also Electronic Message Systems and how
they differ from networks.
8.
(3 meetings) Macro Processors
discussion
of
macro
processors,
their
A
implementation, and their place in text editors
and assembly and composition languages.
95
One point I emphasized and one I feel is important
that
of
hand,
the
it
dual
role
requires
text processing assumes.
the
logic
of
complex
On one
hardware
software in order to "compose" text onto a page.
"text"
assumes
i~
dimension
of
data
text-processing
processing.
application mentioned above is a case
form
On the other hand,
a vehicle of information, and so
a
and
Printi_ng,
and by extension, text processing is therefore an art
that requires intuition and dilligence.
is
in
The database
point.
In
its
purest form, the text stores information that is not easily
represented
containing
by
integers.
real-estate
For
instance,
a
database
listings captures information such
as owner's requirements for assumption of the title.
requirements
are
rarely
amenable to standard codes.
requirement could be "the buyer must pay 20% down
remove
One
not
the rose bushes from the front yard." This is not a
standard stipulation that can be
number
and
These
for
"20%
down"
and
so
represented
must
by
a
coded
be included in its
o r i g i n a 1 f o rm.
Text processing is therefore
calligraphy.
information
theory
and
Bibliography
Boyer, Rober t , and J .
S t roth e r Moore .
"A fa s t S t r i n g
Searching
Algorithm,"
Communications
of
the
Association for Comnuting:-!Macfifnerv~--20:7[2-77~~
Ocfooer~-rg7v~-
---~----
--------~
Cracraft, Stuart.
lVllVI User's Manual Menlo Park, Ca.:
Stanford Researcn--Tnstitufe~-International, 1979.
28 pp.
Data Pro ReQ_orts on Word Processing:. A Report Prepared by
---Tfie -nafa--Pro--Researcn-Corp. Delran, New Jersey:
The Data Pro Research Corp., 1979. 394 pp.
§l~~!!£~~~ Ra~2 §t~pc!!2=22;l~~l ~~~tl~Q~~g ETq~l£~~~Tt ~~!~~!·
ft
repareu uy Fros~. anu Su T1van,
Frost and Sullivan, Inc., 1977.
epor~.
York:
nc.
l'lew
Fredkin, Edward. The MIT Comnrehensive Svstem, Cambridge:
Artificial ___ -rnteTTtgence ____ Liooratory,
MIT,
unpublished.
Grogono, Peter. Programminz in PASCAL.
Reading, Mass.:
Add i s on-VvesT e y-PuoT 1 s fiT n g-co:~- I n c . , 1 9 7 8 • 3 5 9 p p .
Horrowitz, Ellis, and Sartaj Sahni. Fundamentals
structures.
Woodland Hills,-carrrornTa:
Science-Press, Inc., 1976. 564 pp.
of Data
Compufer
Kernighan, Brian W., and P.J.
Plauger.
Software Tools.
Reading, Mass.: Addison-Wesley PubTTsnTng-Company,
1976. 338 pp.
Seybold, John.
Penn.:
Fundamentals of Modern Com2osition.
SeyooTa-PuoTicatTons~-Inc:,-r~vv:-
Vezza, A. and M.S.
Where Does
Tfie--Trenas
NefworJ<s:--
96
Media,
394 PP·
Appendix A
III 1070 EDITOR USER GUIDE COMMAND LIST BY GROUPS
97
98
III 1070 EDITOR USER GUIDE COMMAND LIST BY GROUPS
I•
CURSOR POSIT I ON I NG COMMANDS
B
nC
J
nL
nR
nU
nStext$
nNtext$
I I.
SIMPLE TEXT MANIPULATION COMMANDS
At ext$
nD
Etext$
It ext$
nK
Otext$
M
Bottom of page
Move n Characters forward (-n reverses)
Jump to top of page
Move n Lines down (-L moves up)
Move n characters backward {same as -nC)
Move n Lines Up (same as -nL)
Search current page for 'text' (nth occurrence)
Search entire file for 'text' (nth occurrence)
Append 'text' to right of cursor.
Delete n characters in current line (n.GT.O only)
Exchange 'text' for character
Insert 'text' to left of cursor
Kill n lines including current line (n.GT.O)
Overwrite character(s) with 'text'
Merge next line to end of current line
III. Q-REGISTER
C~~S
(%is Q-register A,B,C,D,E,F or G)
Load% with n lines beginning at character
Load% with text between copy-cursor pair
't' and '1'
Set left copy-cursor '~' to current character
Q<
Q)
Set right copy-cursor •l' to current character
Clear copy-cursor marks •{• and •l'
QZ
Kill text between copy cursor marks •l• and •I'
QK
Get contents of % and insert to left of character
Gl6
QI%text$ Insert 'text' into%
QT%
Type contents of% (follow by typing 'T')
n:XOA>
QX%
IV. FILE MANIPULATING
PN
PC
PX
PA
PM
PK
Turn page
Write out entire file, continue running EDITOR
Write entire file, EXIT to monitor
Append the next page of the file to your buffer
and insert a "NOT" L
Append the next page of the file to your buffer,
inserts nothing extra.
Release files and restart editor. Q-registers
are preserved.
V. OTHER USEFUL
F
vs
VT
T
nT
C~MNDS
CO~~S
Force case change of current character
Toggle switch for case-sensitive search
To~gle switch for case of TTY
** *Special--refresh display on screen********
Display n lines on screen, minimum=3
SPECIAL KEYS WHICH ACT AS COiV1lVIANDS IN COl\1MAND MODE
99
COMMAND
L
c
R
u
I
?
KEYS
L key,Carriage return, line feed, down
on keypad
C key,Space bar, right arrow on keypad
R key,Rubout, back arrow on keypad
U key,Up arrow on keypad
I key,Tab key (inserts tab too)
? key,Types this message
arrow
A NOTE ABOUT CHARACTER CONVERSION:
The, character allows the creation of any
character in the output file. During output (,letter)
is converted to corresponding control character.
(,number) is converted to the corresponding character
in the range of 170-177: i.e • .,5 becomes a) .
Appendix B
A Sample "Read" Program
100
101
%$D-%$L-\(*,NO DEBUGGING, NO LIST*)
(* PROGRAM TO ~~KE LIST IN CORE OF A LINE OF TEXT
AND THEN TYPE IT OUT TO TTY, UNTIL EOF SEEN *)
TYPE
NODE
=
LINK
=
PACKED RECORD
NEXT: 'NODE
PREV: 'NODE
DATA: CHAR
END;
'NODE
(* DEFINE NODE STRUCTURE *)
(*POINTER TO NEXT NODE IN LIST*);
(*POINTER TO PREVIOUS NODE*);
( * ONE CHARACTER OF LINE * )
(*DEFINE POINTER TO THIS TYPE*);
VAR
LBEG,LEND: LINK(* POINTERS TO BEGINNING, END OF LIST*);
STATUS : INTEGER (*FOR RETURNING STATUS FROM PROCEDURES*);
CH : CHAR
( * VARIABLE TO READ CHARACTER INTO *);
PROCEDURE READLINE (VAR BEG,EN : LINK; VAR STAT:INTEGER);
(* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER
VARIABLES BEG AND EN TO POINT TO THE FIRST AND LAST
NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED,
OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE
PROCEDURE. *)
VAR
P : 'NODE
(*A LOCAL POINTER VARIABLE FOR "NEW" *);
BEGIN
BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL;
IF EOF ( INPUT)
THEN STAT:=1 (* RETURN EOF *)
ELSE
BEGIN
IF EOLN( INPUT)
THEN READLN(INPUT) (*GET OVER POSSIBLE EOL *);
IF EOF(INPUT) THEN STAT:=1 (* IF WAS LAST LINE *)
ELSE BEGIN
WHILE NOT EOLN(INPUT) DO
BEGIN
(* CREATE A NEW NODE FOR EACH CI~ *)
NEW(P);read(input,ch); P'.DATA:=cht
IF (BEG=NIL) (* N.B. THERE IS NO *J
THEN
(* CARRIAGE-RETURN NODE *)
BEGIN
BEG:=P;EN:=P (* INITIALIZE LIST *)
END
ELSE
BEGIN
(* ADD TO AN EXISTING LIST *)
EN' . NEXT : =P ;
pI • PREV: =EN;
pI .NEXT: =NIL;
EN:=P
END
END;
101
%$D-%$L-\(* NO DEBUGGING, NO LIST *)
(* PROGRA~ TO ~~E LIST IN CORE OF A LINE OF TEXT
AND THEN TYPE IT OUT TO TTY, UNTIL EOF SEEN *)
TYPE
NODE =
PACKED RECORD
NEXT: 'NODE
PREY: 'NODE
DATA: CHAR
END;
LINK = 'NODE
(*
(*
(*
(*
DEFINE NODE STRUCTURE *)
POINTER TO NEXT NODE IN LIST * ) ;
POINTER TO PREVIOUS NODE*);
ONE CHARACTER OF LINE * )
(* DEFINE POINTER TO THIS TYPE *);
VAR
LBEG,LEND: LINK(* POINTERS TO BEGINNING, END OF LIST*);
STATUS : INTEGER (*FOR RETURNING STATUS FROM PROCEDURES*);
CH : CHAR
( * VARIABLE TO READ CHARACTER INTO *);
PROCEDURE READLINE (VAR BEG,EN : LINK; VAR STAT:INTEGER);
(* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER
VARIABLES BEG AND EN TO POINT TO THE FIRST AND LAST
NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED,
OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE
PROCEDURE. *)
VAR
P : 'NODE
(*A LOCAL POINTER VARIABLE FOR "NEW"*);
BEGIN
.
BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL;
IF EOF(INPUT)
THEN STAT:=1 (* RETURN EOF *)
ELSE
BEGIN
IF EOLN(INPUT)
THEN READLN(INPUT) (*GET OVER POSSIBLE EOL *);
IF EOF(INPUT) THEN STAT:=1 (* IF WAS LAST LINE *)
ELSE BEGIN
·wHILE NOT EOLN( INPUT) DO
BEGIN
(* CREATE A NEW NODE FOR EACH CHAR *)
NEW ( P ) ; r e ad ( i n p u t , c h ) ; P ' . DATA : =c h t
IF (BEG=NIL) (* N.B. THERE IS NO *J
THEN
(* CARRIAGE-RETURN NODE *)
BEGIN
BEG:=P;EN:=P (* INITIALIZE LIST *)
END
ELSE
BEGIN
(* ADD TO AN EXISTING LIST *)
EN' . NEXT: =P ;
P' .PREV:=EN;
p I • NEXT : =NIL ;
EN:=P
END
END;
102
END(* OF EOF ELSE*);
END(* OF EOF ELSE*);
END(* OF PROCEDURE*);
PROCEDURE WRITELINE (VAR BEG,EN:LINK; VAR STAT:INTEGER);
(* SIMILAR TO READLINE, BUT SIMPLER: RETURNS A "1" IF
PASSED A NULL LIST, OTHERWISE STAT=O. WRITES OUT
THE CHARACTERS OF THE LIST ONE AT A TIME, THEN
OUTPUTS A CARRIAGE-RETURN *)
VAR
POINTER: 'NODE;
BEGIN
STATUS:=O;
IF (BEG=NIL)
THEN STATUS:=!
ELSE
BEGIN
POINTER:=BEG;
REPEAT
WRITE(TTY,POINTER'.DATA);
POINTER:=POINTER'.NEXT
UNTIL (POINTER=NIL);
WRITELN(TTY);
END;
END;
BEGIN
( * BEG IN THE !\'lAIN ROUTINE *)
WHILE (STATUS=O) DO
BEGIN
READLINE (LBEG,LEND,STATUS);
IF (STATUS 0)
.
THEN WRITELN(TTY, 'EOF SEEN') (*HERE IF ERROR RETURN*)
ELSE
BEGIN
WRITELINE (LBEG,LEND,STATUS);
IF (STATUS=!)
THEN VVRITELN(TTY, 'Line is NIL');
END;
END;
END.
f
'
Appendix C
CDC Character set
103
104
TABLE B-1.
I
TIME-SHARING 64-CHARACTER SET
CORRESPONDENCE CODE TERMINALtt
INTERNAL
APL PRINT
STANDARD PRINT
DISPLAY CODE
CODE
CODE
CODE
CODE
l CHAR (7-BIT OCTAL (6112 ·BIT OCTAL)
CHAR · (B·BIT
OCTAL l CHAR. (B·BIT OCTALl CHAR. (7-BIT OCTAL
:
00 ttt
121
:
153
276
072
:
:
01
171
171
A
101
A
341
A
A
. 166
02
B
166
B
342
102
B
B
03
172
172
c
c
143
303
c
c
052
04
052
0
104
344
0
0
0
05
112
E
112
145
E
E
305
E
06
163
F
163
F
F
146
306
F
07
043
043
G
G
347
107
G
G
"046
046
10
H
350
110
H
H
H
I I
I
031
031
151
I
311
I
I
12
103
103
J
152
J
J
312
J
13
032
K
032
353
K
K
113
K
14
106
106
L
154
L
L
314
L
15
141
M
141
355
M
M
115
M
16
122
N
122
356
N
116
N
N
17
105
0
105
157
0
3_17
0
0
p
20
p
013
p
p
013
360
120
21
133
Q
133
161
Q
321
Q
Q
22
051
R
051
162
322
R
R
R
23
045
045
s
363
s
123
s
s
24
002
T
002
164
T
324
T
T
25
062
062
u
365
u
125
u
u
26
061
v
061
366
v
v
126
v
27
165
w
165
167.
w
w
327
w
30
142
X
142
170
X
330
X
X
y
y
31
y
147
y
147
371
131
124
32
124
372
132
33
144
0
144
060
0
0
060
0
34
040
I
040
261
I
261
I
I
35
020
2
020
262
2
2
262
2
ISO
36
3
160
063
3
063
3
3
37
4
004
4
004
264
4
264
4
40
010
5
5
010
065
5
065
5
41
130
6
6
130
066
6
066
6
42
150
7
7
150
267
7
267
7
43
070
070
8
8
270
8
270
8
44
064
064
9
9
071
9
071
9
ASCII CODE TERMINALt
APL PRINT
STANDARD PRINT
+
-
*I
(
I
s
=
t
tt
ttt
053
055
252
257
050
251
044
275
+
-
*
I
(
I
$
=
z
z
z
z
055
275
120
257
053
252
374
245
+
-
*I
(
I
s
=
I
023
067
070
007
064
144
004
023
+
-
*I
(
I
a
=
067
067
013
007
153
Ill
171
010
45
46
47
50
51
52
"53
54
I
THE OCTAL CODES LISTED FOR ASCII CODE TERMINALS ARE SHOWN WITH EVEN
PARITY (NORMAL)
THE OCTAL CODES LISTED FOR CORRESPONDENCE CODE TER-MINALS ARE SHOWN
WITH ODD PARITY (NORMAL)
USE OF THE COLON IN PROGRAM AND DATA FILES WILL CAUSE PROBLEMS. THIS IS
PARTICULARLY TRUE WHEN IT IS USED IN PRINT AND FORMAT STATEMENTS.
I
Appendix D
Set Manipulation Program
105
106
(*$D-,L+*)
(* Program to demonstrate a method for defining
sets consisting of characters. Written on a
DEC-10 *)
type
uletter = 'A' •• 'Z';
digit = '0' .. '9';
alset =set of uletter (*upper case chrs *);
var
al,all : alset;
ch : char;
begin
al:= 'A' .. 'C';
all:= 'X' .. 'Z'
ch:='C';
i f ( c h i n a 1 ) OR ( c h i n a 11 ) t hen
wri teln(tty, 'ch is indeed in al or all')
else writeln(tty,'in neither set');
if ch in 'A', 'B', 'C' then begin
case ch of
'A':writeln(tty,'was an A');
'B':writeln(tty,'was a B');
'C':writeln(tty,'was a C')
end;
end;
end.
Appendix E
Editor Specification Followed By Its RUNOFF Source File
107
108
The consensus concerning editor functions
This
editor
will
be
line-oriented; therefore,
the
smallest addressable runount of text is the line.
The editor is written in PASCAL.
As
a
result,
such
common file operations as specifying an input file nrume and
an output
file
possible
(that
name
from
within
the
program
are
not
is, we don't know how to do it, and no one
at DIS does either).
So for the time being, filenames must
be specified at compile time.
The commands to this editor consist
each,
followed
of
by its parameters, if any.
one-character
Any characters
in parentheses immediately following the one letter command
are
given
as mnemonic interest and should not be typed as
part of the command.
A list of the
agreed-upon
commands
and
conventions
follows:
1.
2
The cursor may be thought of as an integer
ranging from zero (in the event of an
empty buffer only) to "L" where "L" is the
number of lines in the buffer. Thus the
character $, as mentioned below, has the
current value of "L".
0
The period character may be used in the
stead of a numeric argument to signify the
cursor position (i.e., the current line).
3.
$
The dollar-sign may be used in the stead
of a numeric argument to signify the last
line's address.
4.
L(ist) m
Print out line 'm' of the buffer onto
terminal
the
109
5. · L(ist) m,n
Print out lines 'm'
through
buffer onto the terminal
'n'
of
the
6.
P(rint) n
Print onto the terminal the current line
and the following m-1 lines if m is
positive. If m is negative, print the
previous m-1 lines in addition to the
current line.
7.
P(rint) m,n
Print onto the terminal the current line
and the previous m lines and the following
nor n-1 lines, however the implementor
decides.
8.
T(op)
This command positions the cursor
first line of the buffer.
9.
B(ottom)
Position the cursor at the
the buffer.
last
at
the
line
of
10.
D(own)
Move the cursor to address
following the current line.
the
line
11.
U(p)
Move the cursor to address
previous to the current line.
the
line
12•
K(ill) n
Delete (or "kill") n lines starting at the
current line.
If n not present, delete
one line.
13.
K(ill) m,n
Kill relative to the cursor. So kill the
same lines that would be displayed with a
P m,n command.
14.
I(nsert)
Insert mode accepts lines that have at
least one character on each one. Each
such line is inserted above the cursor as
it is read. A line with only one space on
it is considered a blank line, and the
space is not inserted into the buffer:
only the line node signifies the presence
of a null line.
If the line is of zero
length on read-in, insert mode is exited.
15.
A(ppend)string
Append mode is similar to insert mode
except that lines are added to the buffer
following the cursor line.
110
16.
M(erge)
Concatenate the line following the cursor
line to the end of the cursor line.
17.
S(earch)text
The search command attempts to find a
match of pattern "text" in the buffer by
checking each line, commencing with the
line after the cursor line. No matches
are possible around end-of-lines.
No
delimiter is needed.
18.
E(nd)string
The end-of-line command
searches
the
current line for string "string" and in
the event of a successful search,
it
breaks the line following the string.
Breaking the line is defined as making the
remainder of the line a new line that
follows the current line.
In the event
the string comprises the last characters
on the line, the new line is blank.
Thus
if we were to type "Ethe " to the program
for the line on which "Ethe " appears
above,
the new line would begin with
"program".
19.
C(hange)/string1/string2/
The change command is valid only for the
current line.
The character "/" used
above signifies any delimiter character,
being by convention the first non-blank
character following the "C" command.
The
useage is to find the first occurrance of
stringl and replace it by string2.
20.
Qxn
This command puts the cursor line and the
next n-1 lines following into Q-register
"x" where "x" is one of the 26 letters of
the alphabet.
It's implementation should
put a co~x of these lines into
the
Q-register, not a pointer to these lines.
The useage could be to save the content of
the lines then kill the lines from the
buffer. The lines can be re-inserted into
the buffer at a different position with
the "G" command. The "n" is optional.
21.
G ( e t) xn
22.
X
The eXit command causes the contents of
the buffer to be written to the output
This command inserts a copy
of
the
contents of Q-register "x".
If "n" is
present, this command is ·repeated so as to
get n copies in all. The "n" is optional.
111
file and the program terminated.
23.
Z(ed)
The Zed command causes the program to be
terminated without writing the contents of
the buffer to be written to the output
file. This command is useful if a drastic
mistake has been wrought, and the best
course of action is to restart editing
from scratch.
24.
H(elp)
Last but not least is the help command.
Typing an 'H' to the program causes a
brief list of these commands to be output
to the screen for the user's benefit. The
description of each command should be
worded in the spirit of refreshing the
user's memory of available commands and
not as a tutorial.
112
.autoparagraph
.spacing 2
.ju;.fill
. lm 0; • rm 59
The consensus concerning editor functions
This editor will be line-oriented therefore, the
smallest addressable amount of text is the line.
The editor is written in PASCAL. As a result,
such common file operations as specifyin~ an input file
name and an output file name from &within & the program
are not possible (that is~ we don't know how to do it, and
no one at DIS does either}. So for the time being,
filenames must be specified at compile time.
The commands to this editor consist of
one-character each, followed by its parameters, if any. Any
characters in parentheses immediately following the one
letter command are given as mnemonic interest and should
not be typed as part of the command.
A list of the agreed-upon commands and conventions
follows:
.no justify
.list 1
• lm 0; • rm 59
• lm 10
• rm 52
• 1e
The cursor may be thought of as an integer ranging from
zero (in the event of an empty buffer only) to "L" where
"L" is the number of lines in the buffer. Thus the
character $, as mentioned below, has the current value of
"L" .
• lm 9; • rm 52
.le
.
:-1m 10
• rm 52
The period character may be used in the stead of a
numeric argument
to signify the cursor position (i.e., the current line) •
• lm 0; • rm 59
. lm 9
• rm 52
.le
$
:-1m 10
• rm 52
The dollar-sign may be used in the stead of a numeric
argument to signify the last line's address .
• lm 0; • rm 59
• lm 9
.le
L(ist) m
• lm 10
• rm 52
Print out line 'm' of the buffer onto the terminal
113
• lm 0; • rm 59
• lm 9
.le
L( ist) m,n
.lm 10
.rm 52
Print out lines 'm' through 'n' of the buffer onto the
terminal
.lm O;.rm 59
.lm 9
• 1e
P(rint) n
.lm 10
• rm 52
Print onto the terminal the current line and the
following m-1 lines if m is positive. If m is negative,
print the previous m-1 lines in addition to the current
line •
• lm O;.rm 59
• lm 9
.le
P(rint) m,n
• lm 10
. rm 52
Print onto the terminal the current line and the previous
m lines and the following nor n-1 lines, however the
implementor decides •
. lmO;.rm59
.lm 9
.le
T(op)
.lm 10
. rm 52
This command positions the cursor at the first line of
the buffer •
• lm O;.rm 59
• lm 9
• 1e
B(ottom)
.lm 10
.rm 52
Position the cursor at the last line of the buffer .
• lm 0; • rm 59
• lm 9
.le
D(own)
.lm 10
.rm 52
Move the cursor to address the line following the current
l i ne •
. lm 0; • rm 59
• lm 9
.le
U(p)
. lm 10
• rm 52
Move the cursor to address the line previous to the
current line.
114
• lm 0; • rm 59
.lm 9
.le
K(ill) n
.lm 10
• rm 52
Delete (or "kill") n lines starting at the current line.
If n not present, delete one line •
• lm O;.rm 59
• lm 9
.le
K(ill) m,n
.lm 10
• rm 52
Kill relative to the cursor. So kill the same lines that
would be displayed with a P m,n command •
• lm 0; • rm 59
• lm 9
.le
I(nsert)
• lm 10
.rm 52
Insert mode accepts lines that have at least one character
on each one. Each such line is inserted above the cursor
as it is read. A line with only one space on it is
considered a blank line, and the space is not inserted
into the buffer: only the line node signifies the presence
of a null line. If the line is of zero length on read-in,
insert mode is exited •
• lm 0; • rm 59
. lm 9
.le
A(ppend)string
• lm 10
.rm 52
Append mode is similar to insert mode except that lines
are added to the buffer following the cursor line •
• lmO;.rm59
. lm 9
.le
M(erge)
.lm 10
• rm 52
Concatenate the line following the cursor line to the end
of the cursor line •
. lm 0; • rm 59
.lm 9
.le
S(earch)text
.lm 10
• rm 52
The search command attempts to find a match of pattern
"~ext" in ~he buffer by checking each line, commencing
with the line after the cursor line. No matches are
possible around end-of-lines. No delimiter is needed •
• lm 0; • rm 59
• lm 9
.le
115
E(nd)string
.lm 10
.rm 52
The end-of-line command searches the current line for
string "string" and in the event of a successful search,
it breaks the line following the string. Breaking the line
is defined as making the remainder of the line a new line
that follows the current line. In the event the string
comprises the last characters on the line, the new line
is blank. Thus if we were to type "Ethe#" to the program
for the line on which "Ethe#" appears above, the new line
would begin with "program" •
• lm 0; . rm 59
.lm 9
.J e
C(hange)/string1/string2/
.lm 10
• rm 52
The change command is valid only for the current line.
The character "/" used above signifies any delimiter
character, being by convention the first non-blank
character following the "C" command. The useage is to
find the first occurrance of string1 and replace it by
string2 .
• lm 0; • rm 59
. lm 9
• 1e
Qxn
.1m 10
.rm 52
This command puts the cursor line and the next n-1 lines
following into Q-register "x" where "x" is one of the 26
letters of the alphabet. It's implementation should put
a t&copy\& of these lines into the Q-register, not a
pointer to these lines. The useage could be to save the
content of the lines then kill the lines from the buffer.
The lines can be re-inserted into the buffer at a
different position with the "G" command. The "n" is
optional •
• lm 0; • rm 59
• lm 9
• 1e
G( e t) xn
• lm 10
• rm 52
This command inserts a copy of the contents of Q-register
"x". If "n" is present, this command is repeated so as
to get n copies in all. The "n" is optional •
• lmO;'.rm59
• lm 9
.le
X
.lm 10
• rm 52
The eXit command causes the contents of the buffer to
be written to the output file and the progrrun terminated .
• lm 0; • rm 59
• lm 9
116
• 1e
Z(ed)
.lm 10
.rm 52
The Zed command causes the program to be terminated
without writing the contents of the buffer to be written
to the output file. This command is useful if a drastic
mistake has been wrought, and the best course of action
is to restart editing from scratch •
• lm O;.rm 59
.lm 9
.le
H(elp)
.lm 10
.rm 52
Last but not least is the help command. Typing an 'H' to
the program causes a brief list of these commands to be
output to the screen for the user's benefit. The
description of each command should be worded in the
spirit of refreshing the user's memory of available
commands and not as a tutorial .
• els
Appendix F
"FINDL" Pattern-Matching Program
117
118
(* FILE FINDL.PAS
ALGORITHM 'FIND' USING A LIST
FOR PATTERN STRING *)
(*$D+,l-*) (* SET DEBUGGING, LIST OPTION SWITCHES
INSIDE COMMENT *)
( * PROGRAM TO TEST THE SEARCH ALGORITHM "FIND"
(SEE BELOW).
(* note that a "'" replaces an up-arrow in this
listing as that character is not on print-wheel
of the diablo *)
SEARCHES A LIST IN CORE OF A LINE OF TEXT. *)
TYPE
NODE =
PACKED RECORD
(* DEFINE NODE STRUCTURE *)
(*POINTER TO NEXT NODE*);
NEXT: 'NODE
PREV: 'NODE
(*POINTER TO PREVIOUS NODE*);
( * ONE CHARACTER OF LINE * )
DATA: CHAR
END;
(*DEFINE POINTER TO THIS TYPE*);
LINK = 'NODE
VAR
LBEG,LEND : LINK
POS,PATPNT,PATHEAD
STATUS : INTEGER
CH : CHAR
(* POINTERS TO BEGINNING AND
END OF LIST*);
LINK;
.
( * STATUS FROM PROCEDURES *);
( * VARIABLE TO READ CHAR INTO *);
PROCEDURE READLINE (VAR BEG,EN : LINK; VAR STAT:INTEGER);
(* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER
VARIABLES BEG AND EN TO POINT TO THE FIRST AND LAST
NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED,
OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE
PROCEDURE. * )
VAR
P : 'NODE
(*A LOCAL POINTER VARIABLE FOR "NEW" *);
BEGIN
BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL;
IF EOF(INPUT)
THEN STAT:=! (* RETURN EOF *)
ELSE
BEGIN
IF EOLN(INPUT)
THEN READLN(INPUT) (*GET OVER POSSIBLE EOL *);
IF EOF(INPUT) THEN STAT:=! (* IF WAS LAST LINE *)
ELSE BEGIN
WHILE NOT EOLN( INPUT) DO
BEGIN
(* CREATE A NEW NODE FOR EACH CHAR *)
NEW ( P ) ; r e ad ( i n p u t , c h ) ; P ' . DATA : =c h ~
IF (BEG=NIL) (* N.B. THERE IS NO *J
THEN
(* CARRIAGE-RETURN NODE *)
BEGIN
BEG:=P;EN:=P (* INITIALIZE LIST *)
119
END
ELSE
BEGIN
(* ADD TO AN EXISTING LIST *)
EN I • NEXT: =p;
P 1 .PREV:=EN;
pI .NEXT: =NIL;
EN:=P
END
END;.
END(* OF EOF ELSE*);
END(* OF EOF ELSE*);
END(* OF PROCEDURE*);
PROCEDURE FIND(S:LINK;PAT:LINK;VAR I:LINK);
(*
IMPLEMENTATION OF ALGORITHM "FIND" FRONI
HORROWITZ AND SAHNI. FIND IN STRING S THE FIRST
OCCURRENCE OF STRING PAT AND RETURN I AS A POINTER TO THE NODE IN S WHERE PAT BEGINS; OTHERWISE
RETURN I AS NIL.
*)
VAR
SAVE,Q,P : LINK;
BEGIN
IF ((PAT=NIL)OR(S=NIL))
THEN I: =NIL
ELSE (* EXECUTE REST OF ROUTINE ONLY IF NOT NIL *)
BEGIN
I:=NIL
(* ~~E SURE I IS NOT VALID ON ENTRY*);
P:=S;
REPEAT
SAVE:=P; Q:=PAT (*SAVE THE STARTING POSITION*);
( * COi\1MENT OUT DEFECTIVE EXAMPLE CODE
"~ILE ((NOT(P=NIL))AND(NOT(Q=NIL))AND
(P 1 .DATA = Q' .DATA)) DO
BEGIN
THIS CODE IS INCLUDED AS AN
p : =p I • NEXT;
EXAMPLE-- IT DOESN I T WORK;
Q: =Q I • NEXT ;
END·
*) '
WHILE NOT ((P=NIL)OR(Q=NIL)) DO
BEGIN
IF (P' .DATA=Q' .DATA)
THEN
BEGIN
p : =p I • NEXT; Q: =Q I • NEXT;
END
ELSE P:=NIL;
END;
IF (Q=NIL)
THEN
120
BEGIN
P:=NIL; I:=SAVE
(* SUCCESS! P IS SET TO
NIL TO STOP LOOP *)
END
ELSE P:=SAVE'.NEXT (*OTHERWISE CHECK STRING
BEGINNING AT NEXT CHAR *);
UNTIL (P=NIL);
END (* OF ELSE ASSOCIATED WITH
"IF ((PAT=NIL)OR(S=NIL))" *);
END(* OF PROCEDURE 'FIND' *);
BEGIN
(* BEGIN THE MAIN ROUTINE *)
NEW(PATPNT)
(*CONSTRUCT A PATTERN IN A LIST*);
PATPNT'.DATA:='F';
PATHEAD:=PATPNT;
NEW( PATPNT);
PATPNT 1 • PREY: =PATI-IEAD ; PATHEAD 1 • NEXT: =PATPNT;
PATPNT I • DATA: = I 0 I ;
NEW(PATPNT);
PATHEAD'.NEXT'.NEXT:=PATPNT;
PATPNT'.PREV:=PATHEAD 1 .NEXT;
PATPNT 1 .DATA:='0' (*DONE CONSTRUCTING PATTERN*);
READLINE (LBEG,LEND,STATUS);
IF (STATUS 0)
THEN WR I TELN (TTY, ' EOF SEEN ' ) ( * HERE IF ERROR RETURN * )
ELSE
BEGIN
FIND(LBEG,PATHEAD,POS);
IF NOT(POS = NIL)
THEN WRITELN(TTY' I FOUND IT I)
ELSE WRITELN(TTY, 'NOT FOUND');
END;
END.
Appendix G
"FINDA" Pattern-Matching Program
121
122
(* FILE FINDA.PAS -- ALGORITHM 'FIND' USING AN ARRAY FOR
THE PATTERN STRING *)
(* note that a "'" replaces an up-arrow in this
listing as that character is not on print-wheel
of the diablo *)
CONST
MAXPAT = 50;
TYPE
NODE = PACKED RECORD (* DEFINE NODE STRUCTURE *)
NEXT: 'NODE
(*POINTER TO NEXT NODE*);
PREV: 'NODE
(* POINTER TO PREVIOUS NODE*);
( * ONE CHARACTER OF LINE *)
DATA: CHAR
END;
LINK = 'NODE
O (* DEFINE POINTER TO THIS TYPE *);
PATTERN = ARRAY 1 .. MAXPA1j OF CHAR;
MAXLEN = 1 .. MAXPAT;
VAR
LBEG,LEND : LINK
(* POINTERS TO BEGINNING
AND END OF LIST*);
PATARR : PATTERN;
PATLEN : MAXLEN;
POS,PATPNT,PATHEAD : LINK;
STATUS : INTEGER
( * RETURN FROM PROCEDURES * )
CH : CHAR
( * VARIABLE TO READ CHR INTO *);
PROCEDURE READLINE (VAR BEG,EN: LINK; VAR STAT:INTEGER);
(* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER
V~niABLES BEG AND EN TO POINT TO THE FIRST AND LAST
NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED,
OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE
PROCEDURE. *)
VAR
P : 'NODE
(*A LOCAL POINTER VARIABLE FOR "NEW" *);
BEGIN
BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL;
IF EOF(INPUT)
THEN STAT:=1 (* RETURN EOF *)
ELSE
BEGIN
IF EOLN(INPUT)
THEN READLN( INPUT) (* GET OVER POSSIBLE EOL *);
IF EOF(INPUT) THEN STAT:=1 (* IF WAS LAST LINE *)
ELSE BEGIN
'WHILE NOT EOLN( INPUT) DO
BEGIN
( * CREATE A NEW NODE FOR EACH CHAR *)
NEW ( P ) ; r e ad ( i n p u t , c h ) ; P ' . DATA : =c h t
IF (BEG=NIL) (* N.B. THERE IS NO *J
THEN
(* CARRIAGE-RETURN NODE *)
BEGIN
123
BEG:=P;EN:=P (* INITIALIZE LIST *)
END
ELSE
BEGIN
(* ADD TO AN EXISTING LIST *)
EN I • NEXT : =p ;
P 1 .PREV:=EN;
P' .NEXT:=NIL;
EN:=P
END
END;
END(* OF EOF ELSE*);
END(* OF EOF ELSE*);
END(* OF PROCEDURE READLINE*);
PROCEDURE FIND(S:LINK;PAT:PATTERN;
VAR I:LINK;PLENGTH:MAXLEN);
{*
'
AN ALTERNATE IMPLEMENTATION OF ALGORITHM "FIND" FRaVI
HORROWITZ AND SAHNI.
AND
FIND IN STRING S THE FIRST OCCURRENCE OF STRING PAT
RETURN I AS A POINTER TO THE NODE IN S
BEGINS;
OTHERWISE RETURN I AS NIL.
~~ERE
PAT
*)
VAR
SAVE,P : LINK;
Q: 1 .. MAXPAT;
BEGIN
IF ((PLENGTH=O)OR(S=NIL))
THEN I: =NIL
ELSE
BEGIN
(* ATTEMPT A
STRING IF
STRING( 2)
I:=NIL
P:=S;
MATCH STARTING WITH BEGINNING OF EACH
NO MATCH, TRY TO l.VIATCH PAT( 1) WITH
AND SO ON *)
(*SET TO NIL IN CASE WE \VERE PASSED A
NON-NIL VALUE*);
REPEAT
SAVE:=P; Q:=1 (*SAVE THE STARTING POSITION*);
WHILE NOT ({P=NIL)OR(Q)PLENGTH)) DO
BEGIN
IF {P 1 .DATA=PAT[Q])
THEN
BEGIN
p : =p I • NEXT ;
Q: =Q+1;
END
ELSE P:=NIL;
END;
IF (Q)PLENGTH) (* CHECK TO SEE IF WE EXAUSTED
124
PATTERN LENGTH *)
THEN
BEGIN
P:=NIL; I:=SAVE (* IF SO, WE HAVE A MATCH*)
END
ELSE
BEGIN
P:=SAVE' .NEXT (* OTHERWISE, CHECK STRING BEGINNING WITH NEXT CHAR *)
END;
UNTIL (P=NIL);
END (* OF THE ELSE ASSOCIATED WITH
'IF ((PLENGTH=O)OR(S=NIL)}' *);
END(* OF FIND*);
BEGIN
(* BEGIN THE MAIN ROUTINE *)
(* ARTIFICIALLY LOAD PATTERN ARRAY WITH A STRING *}
PATARR(l} :='F'; PATARR(2] :='0'; PATARR (3] :='0' ·
PATLEN:=3 (*SEARCH TAHGET STRING FOR 'FOO' *i;
READLINE (LBEG,LEND,STATUS)(* GET FIRST LINE OF THE
FILE I INPUT I*) ;
IF (STATUS 0)
THEN WR I TELN (TTY, ' EOF SEEN ' ) ( * HERE IF ERROR RETURN * )
ELSE
BEGIN
FIND(LBEG,PATARRtPOS,PATLEN);
IF NOT(POS = NILJ
THEN WRITELN(TTY,'FOUND IT')
ELSE WRITELN(TTY,'NOT FOUND');
end;
END.
,
Appendix H
Flow of RUNOFF Justification
125
126
j-.y SoaceJ
o, Lt l"'e7
r~
1~0
I
I
I
l_ _____..,.
'J
,( sp•'-«> •'• w,, J.t.J ~foe /,ff•( I-f• /,;,,, Cx<P:>
1
1'1UMJ,,.. ,/-' f,:.,.~
w>t aJ./ ,_. ~ &drf,.L $..0.:. c.e; 4Y ~,_.,N
,_~ a UIViVr
r~• u.o ~ ""'~ IJy J~h/'~ lhJt:DA.tD/f'JJn~t/'lt'
tiJ..
1
,r
;:-:::::::::::=:: :::::-:::::::::::::::::::: ::.:;:::::: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
~.; ~~AC1Ll b
~.·:.:...c..o:.la.t!Q
-.lS
oo
.................... _ ................ ,.I
I
I
I
I
I
I
I
I
I
I
I
I
A
~
:0
I
I
.I
I
____ JI
tl~/,,:;'1
a.,.
~-rr~ ~t7•CC.
Appendix I
Text Processing Midterm and Final
127
128
Text Processing
Wi 11 i ams
cs
Midterm
361
April 4 1979
This is a closed book, closed notes test.
Please
answer all questions in the space provided on the test.
The following are short essay questions to be answered
as concisely as possible. A simple "Yes" or "No" is too
concise.
1.
What is meant by a meta character?
2.
What is the main difference between TECO
line editor (in terms of addressing)?
3.
Why is it necessary to type "271" to TECO in order
to insert an alt-mode (symbolized by $)?
and
a
4.
J(:SProcedure P'X$;0tt.-z;)$$
What does this macro do?
5.
In TECO, what is a reasonable way to
from other macros?
6.
TECO usually requires an alt-mode to delimit
commands.
Please specify what genres of command
require such a delimiter (in any editor).
7.
In the text editor we specified, what convention
would need to be changed to insert a line below
the last line in the buffer if the "Append"
command didn't exist?
8.
In our .PASCAL, why would it be necessary to
implement a separate dispose routine for different
record types?
call
macros
129
9.
It has been alle~ed that our editor is easier to
implement with linked lists. How would you defend
this thesis?
10.
Why is it necessary to know the
pattern
string
when
simple
implemented with an array (as
linked-list)?
11.
How would you change the editor specified in class
to allow the user to type multiple commands on one
line?
12.
Why is it so easy for TECO to allow a user to type
'N (i.e., a "control-N") before any one of its
other search-string-specifying commands, when that
following command may be arbitrarily complex?
13.
In many searching algorithms, a pass is made over
the
pattern strin~ to "compile" it into an
internal representation. Why?
14.
What is a major drawback of
algorithm implemented in TECO?
15.
In many text formatters, a distinction is made
between the line length and the line width of the
output line. Define these terms.
16.
In formatted text:
a) What is a "river"?
b) How is it avoided?
length of the
searching
is
opposed to
a
the
searching
130
17.
The Software Tools formatter provides a facility
to oufpuf-fhe-page number either in the header or
the footer, in any character position.
How is
this accomplished?
18.
In respect to text formatters:
19.
Assume you are told to install a "figure" command
in a text formatter that allows enough space for a
figure of a certain size.
a) Give the exact format of the command
(for the user)
b)
20.
What is a "Break"?
Specify how you would implement it.
One way to implement a control structure in the
editor is to use a case statement to recognize
which command has been typed.
a) What happens if the user types
an illegal cornnand?
b)
How would you protect yourself
from this?
131
21.
The following is a PASCAL attempt to implement the
previous question. What is wrong with it? (Give
PASCAL
language
faults.)
Note
that
Teletypeinputroutine is a function that returns
the next character from the teletype, and that the
i den. t i f i e r s end i n g i n "command" are p r o c e d u r e s .
Finding 4 faults is sufficient for full credit.
~!£~~g~!~
Commandloop;
Legalcommand = (A,B,C,D,E,Z);
LC : Legalcommand;
CH : char;
STATUS: integer;
TTYSTAT:=O; STATUS=O;
CH := Teletypeinputroutine;
IF CH IN LC
THEN BEGIN
----CASE CH OF
A:
B:
C:
D:
E:
Z:
~~;
Appendcommand;
Bottomcommand;
Changecommand;
Downcommand;
Endoflinecommand;
Quitcommand(sets "status");
132
Text Processing
Williams
cs
Final Exam
361
May 21, 1979
This is a closed book, closed notes test.
Please
answer all questions in the space provided on the test.
The following are short essay questions to be answered
as concisely as possible. A simple "Yes" or "No" is too
concise.
1.
What is meant by a meta character?
2.
What are the two most significant differences
betwen RUNOFF and a composition program for a
typesetter (like PAGE-3)?
a)
b)
3.
Why is it not possible to make the
declaration in NOS PASCAL?
TYPE Charset = SET OF CHAR;
following
4.
The matrices of many pre-TTS typesetters did
not have widths relative to each other. How
was justification achieved on these machines?
5.
What is meant by the
of type measurement?
6.
What is meant by
photocomposition?
7.
In-regards to typeface design, what
by "x-height"?
"relative-width
"base-line
system"
alignment"
is
in
meant
133
8.
What are two options available to a printer
(or composition programmer) in eliminating a
"widow" line?
a)
b)
9.
What is a "discretionary hyphen" and
its use?
10.
What is meant by
descendent from?
11.
a) What is "kerning"?
"TTS"
b) How is it accomplished
cold typesetting?
and
in
what
either
what
is
is
it
hot
or
12.
What is "mortising" (in printing)?
13 .
Wh a t
is
t h e rna i n
third-generation
typesetter?
14.
What
distinguishes
a
typesetter from others?
15.
What are two methods for sizing (making larger
or smaller) characters in a phototypesetter?
a)
b)
difference
between
a
and
second-generation
fourth-generation
134
16.
Many fonts are measured in parts of an "em".
That is, an "em-space" may measure 18 units,
while an "en"
space
only
nine
units,
regardless of point size.
Please give the
conditions under which it is necessary to
convert from this system of measurement inside
a typesetter and why.
17.
Give three methods of storing width values (of
characters)
on
typesetters
(not
just
third-generation).
a)
b)
c)
18.
What is meant by a perimeter character?
19.
What is meant by "run length encoding" and how
is it used?
20.
In justifying a line, there are two ways to
manipulate the amount of space occupied by the
words on that line. What are they?
21.
What is meant by "leading"?
22.
What is vertical justification and
it be used?
why
would
135
23.
In hyphenation:
a) What is meant by "cleaning up the word" and
why is it done?
b) Why would it be
characters within a
up"?
necessary to
examine
word while "cleaning it
24.
What are the two main
accomplish hyphenation?
methods
used
to
25.
What is "vertical justification"?
26.
In hyphenating a word, some procedures attempt
to hyphenate by beginning from the right end
of the word. Why would this be preferred to
working from the left?
27.
What is meant by a "homograph"?
28.
What is an exception dictionary and how is
used?
29.
What is a common method of ensuring the
correct spelling for input to a typesetter?
30.
The early composition programs set 3-column
text by reading the first line for each column
from a file by random-access, then settin~ the
first
line of each column with these lines.
a) Why was this necessary?
it
b) What problems were discovered in using this
procedure?
136
31.
What were two advantages of
Selectric Typewriter?
the
IBM
Magtape
a)
b)
32.
What is the difference between an
typewriter and a word processor?
electronic
33.
What is meant by a
display system?
standalone
34.
What are two advantages of
station in word processing?
35.
What is a "boilerplate"?
36.
What is a shared-logic word processor?
37.
What
is
the
main
disadvantage
shared-logic word processor?
38.
What is meant by a "data processing interface"
in relation to word processing?
39.
What are two advantages of
processors?
40.
How do word processing companies. justify
relatively high cost of a word processor?
41.
How does an Electronic Message
from a computer network?
"thin
window"
a
"dual
media"
of
shared-logic
System
a
word
the
differ
137
42.
\Vhat are two methods of managing
Electronic Message Systems?
43.
What are four reasons that
Electronic Message System?
would
messages
in
justify
an
a)
b)
c)
d)
44.
What is meant by a "passive" network?
Download