CALIFORNIA STATE UNIVERSITY, NORTHRIDGE AN EXPERI~illNTAL CURRICULUM FOR TEXT PROCESSING A project submitted in partial satisfaction of the requirements for the degree of Master of Science in Computer Science by G1 en Wi 1 1 i ams ~ June, 1979 The Project of Glen Williams is approved: P r o f e s s o r lVIo t i l Professor Walsh Professor Abbott California State University, Northridge i i CONTENTS Contents .............. . Abstract ......•• The Objectives . . . . . . . . . . . . . . . . . The Realities .. TECO • . . . • . . . . . • iii .iv . .... 1 .13 .14 • • • • • • • • 21 • • 24 .32 The Editor Assignment. ..........•.. An Aside On PASCAL. Pattern Matching .. Formatting . . . . . . . . . . • ••• 3 7 . ... 4 2 Typesetting . . . . . . . . . . ...•. 4 7 First Generation Typesetters ...... . .52 TTS . . . . . . . . . . . . . . . . . . . . . · · · · · · · · · · · · · The Advent of Photocompostion.......... .54 Second Generation Principles......... ....... .57 Third Generation Typesetters. ..... . . . . . . . . . 61 Compos i t i on • . . . • . . . . . . . •••. . . • •. . . . . . •. •66 Hyphenation......................... . 70 Input To A Typesetter....... ........ ...... .74 Pagination......... .76 Word Process in~.... . .... ...... .... ........ . .80 Office Automation. ....•.. ............. ..85 Con c 1 us i on . . . ... .. .......... . . 91 Bibliography. ...... . . ......... . .96 Appendix A (III 1070 Editor User Guide). .97 Appendix B (A Sample "Read" Program) . . . . . . . . . . . . . . . . . . . . 100 Appendix C (CDC Character Set) . . . . • . . . . . . . . . . . . . . . . . . . . . 103 Appendix D (A Set Manipulation Program).. ..... . .. 105 Appendix E (Editor Specification)............. . ... 107 Appendix F ("FINDL" Pattern-Matching Program) ... 117 Appendix G ("FINDA" Pattern-Matching Program) .. 121 Appendix H (Flow of RUNOFF Justification)..... . .. 125 Appendix I (Text Processing Midterm and Final) ... 127 i i i ABSTRACT AN EXPERIMENTAL CURRICULUlVI FOR TEXT PROCESSING by G 1 en Wi 11 i ams Master of Science in Computer Science This project details the definitions and implementation of an experimental curriculum for an upper division class in Text Science Department at Processing within 2) to experiment Computer CSUN in order 1) to determine what information should be presented about this and the with a emerging field method of presenting that information. The class was offered in the Spring of 1979 under the direction of the objectives defined in the first section of this paper. arose In teaching the concerning class, certain realizations the implementation details specified to meet these objectives. As often as not, these details were not predicted. Therefore, divisions: this paper consists of two conceptual a statement of the a priori objectives; description of interspersed with the material remarks iv actually and a covered on the class's reaction to the material. Topics covered include the processing use of TECO as a text language, the implementation considerations for a text editor program, considerations for a pattern matching, implementation formatting program, the history and development of typesetting and phototypesetting devices and composition software, word processing, electronic message systems and the automated office. text editor The implementation of a was assigned as a project, which necessitated some discussion of the PASCAL programming language. v The Objectives In considering the nature of aspects may somewhat come to loosely d i s c i p 1 in e s mind. defined "text Text in processing" processing relation to computers information). more is as yet the other that together camp r i s e "Camp u t e r S c i en c e" • has been with us in one form or another first many (in the almost since It the form of holerith cards carrying Yet it existed almost 'by the leave' of the urgent tasks at hand -- the loading of data (programs included) into the computer. involves its many of Even now text sister disciplines. processing For instance, pattern recognition is involved in the process of capturing text during OCR while data compaction can be invoked in storing characters in a DBMS (as course, text processing may with processors and And, of be said to 'take over' such heretofor non-computer disciplines word ADABAS). as editors) typewriting and (e.g., typesetting (phototypesetters). The faster text than processing most other field is currently expanding areas of computer science in the form of the automated office (wherein text processing plays a central role), electronic message systems, and document/data processing interfaces. The automated office and the Electronic Message System ( ElVIS ) are c 1o s e 1 y r e 1at e d , as i t i s t h e EMS t h a t p r o v i de s the capability for sharing and storing data from heretofore isolated word processors. The 1 EMS also enables the 2 travelling executive to maintain intimate ties organization at any time of day. His time is saved in that recipients of his communications need not be with his "present 11 to be reached. Another new linking of disciplines is introduced the relation of document Document processors are link to form: for data data processing to data processing. adding processing the documet 11 with capabilities applications to directly in an interesting hides 11 binary information in text files processing up-to-date reports. programs to utilize in providing For instance, the latest figures in a company's financial report may exist in a text file that is accessed by data-processing programs to make financial projections. Therefore the objectives of a seek to identify the reach of text processing text processing. introducing the field, a great deal of emphasis on class is In placed knowing the problems inherent in a well engineered text editor. This appreciation knowledge for the for its manipulation. equips the student with an nature of 'text' as well as methods The rest of this section discusses the objectives in more detail. TECO This course therefore introduces the students to TECO, a powerful character-oriented text editor and progrrunming language sold by Digital Equipment Corporation. TECO may 4 smallest addressable unit of text. an introduction specify and discussion a an editor. SOS is a where a implement particular, SOS. to SOS mentioned as of the various ways to text-processing program: in Augmenting this is a discussion of progrrun stands They are written for "SON at Stanford OF STOPGAP". University, It is so nruned because it is meant to bridge the gap between line-oriented editors (like LINED at DEC) and TECO. Stopgap is primarily line-oriented in that one may address lines as the smallest addressable unit of text. However, it differs from simple line-oriented editors in its 'Alter' command. allows which the he user may to move character-cursor enter an intra-line editing mode in through moving This command the commands, current and line with of course, he may modify individual character strings within the line without retyping the whole line. The Editor Project The purpose of placing the editor beginning of the course a The class is required to functional specification of an editor: specifying the editor he will trauma, discussions for at the implement. of how to implement To ease their following few sessions will each feature internal routines may be defined complimentarily addressed. hand each student center on what a reasonable editor should incorporate. subject the is to allow the assignment of a simple editor as a project. in discussion The and how the are also 5 To be specific, some considerations are: 1. The language the editor is to be written in 2. What type of device will it smart terminal?) 3. Line editor or character editor? The choice affects the cursor-positioning commands at the disposal of the us.er 4. How will user get a display of his text? 5. How is the cursor represented? 6. Will the cursor position follow typeout (as with XEDIT, a line-oriented editor for CDC machines)? 7. How is the cursor moved? 8. What form commands? 9• How many commands wi 11 the user be able to enter at one time if commands are entered in line-mode? the program gets i.e. ' characters from the terminal only after a 'break' character is entered. 10. What form should insert take and how should it work? That is, should text be inserted before or after the cursor? What bearing does this have with entering text before the first line or after the last line? 11. Should the delete command delete characters or lines·or both? 12. How do we divide a file into pages--or we? 13. What form should the editing buffer and/or the Q-registers take? 14. How is text entered into a Q-register (a Q-register is an unbounded temporary storage area for saving text) 15. How is text retrieved from how is this specified? 16. Searching should it search around end-of-lines? should it search backwards? what happens to the cursor in the event of either failure or success? Should there be class-matches for pattern elements? wi 11 commands run on (TTY or In what increments? take? a One-letter should Q-register and 6 17. Help command. How will provision be made for allowing a user to get a refresher of which commands are available, etc.? 18. Meta characters: How does the user specify characters that are to be in the output file if the character is not in the keyboard's character set? Meta characters are one possibility. For instance, SOS uses the question-mark as a meta character. To insure the milestone-style completion assignments are of this editor, made throughout the semester. Pattern Matching A subject of great importance within is pattern matching. Concerns within text processing pattern matching theory include minimizing the number of operations required for an average search and matching classes of patterns that can be defined simple as expressions. In introducing the procedure FIND described by Horowitz and Sahni 1 is examined. that is This procedure runs in time O(mn) where m is the length of the pattern regular pattern and matching, n the length algorithm is obviously quadratic. can easily Sahni 2 , of the string. From this Such an procedure we move to an improved FIND, also in Horowitz and --------iElll;-Horrowitz and Sartaj Sahni, Fundamentals of Data Structures (Woodland Hills, californTa:---computer ScTence-Press~-Tnc., 1976), p. 191. 2 Ibid., p. 192. 7 that introduces the short-cut of last characters length is terminating greater characters in string. also running the first and of the pattern with their counterparts in the string and also of pattern checking in than This time the when the the number of remaining algorithm, O(mn), search while ·potentially actually exhibits better performance in actual useage. From concerns of speed it is possible to move on to a discussion of "A Fast Searching Algorithm" 3 . This algorithm introduces more preprocessing of the pattern as a means of reducing the number of characters examined in the course of a search. that never we progressing cycling It employs a move through through technique· backwards in string. The the the for ensuring string actual while loop for the string is also shortened by employing two arrays that provide the information of how far we can "slide" in the string as a function of either the character at the cursor subpatterns or within as the a function pattern. of reoccurrances of This technique has been found to be sublinear in the number of comparisons required to find a successful match. "A fast of the ocTober-;- 8 attern-class search as matching found "wild-card" techniques in In SNOBOL. matches as to employed speed such types of discussing in TECO, the subject of very general pattern searches may be broached. a pattern of up to 36 characters. position of the pattern we may class of so-called specify characters as a match. In any TECO allows any character character For instance a control-X character entered as a character for a given position cause TECO to accept ~gy character as a match. will One may also specify to accept any element of an enumerated set characters. or of By prefacing a specification with a control-N character, we search for any match except that specified by the following specification. patterns is or rejected t~at as a character in the string can be a than 7 machine The instructions. reason, of course, is due to the first pass TECO makes to encode the pattern. the accepted match in one machine instruction. search loop itself is less The The beauty of TECO's wildcard student Thus it should become obvious to that a small amount of preprocessing can save one or two orders of magnitude on operations, depending upon the length and complexity of the pattern. The obvious drawback is that we cannot make searches contain that an elipsis. A very general pattern search on the order of SNOBOL is coded and described by Kernighan and Plauger 4 . --------i~;i;~-W. Kernighan and P.J. Plauger, Software Tools, (Reading, Mass.: Addison-Wesley Publishing Company~ TY7oT, pp. 135-161. 9 Their pattern expressions pattern matching except which of the considerations. regular all covers patterns of the form x(albc)y, the matches preprocesses by algorithm either xay or but pattern, xbcy. It also for any speed not It is mainly to make the classes the actual matching logic. readable Their treatment is valuable to the student in that it clearly delineates all the involved in implementing a sophisticated (in the sense of handling closure) pattern matcher. treatement as well It gives an extensive to both the building the pattern representation as possible to the leads utility for a of recursion closure in match. following Following treatment of matching, the idea of searching and in the their same work student: steps operation is perhaps all their is detailed. its block-structured language: Finally, a to written the in avid a pseudo RATFOR. discussion characteristics are replacing The final beauty of accessability routines the on blending speed the of the Boyer and Moore sublinear searching algorithm with the generality of either TECO's or Kernighan and Plauger's probes pattern matching. most of the important aspects of Such a blend is indeed possible but to my knowledge, not yet realized in a commercial product. Text Formatting Kernighan and Plauger also give a detailed, program. 5 discussion and example of fine, a if not too text formatting 10 Incorporating material in that from~£!!~~~~!££!~ is useful it allows discussion of fine details by examining the example program developed therein. Two prime functions of a text formatter to do are covered: "page layout" and horizontal justification. these discussions appropriate corrmands to allow Within the user control over these functions arise, as well as the criteria for their implementation. However, their logic for justification in that it does evenly as does RUNOFF not wanting seem to distribute extra spaces as RUNOFF. employs seems Therefore, the is examined in detail. exact algorithm Examples of RUNOFF input files are also given in order to make the sense above is of certain commands more concrete. The reason page layout was in neither RUNOFF quotes that nor the Kernighan and Plauger program deal with page layout as thought of by printers. Page layout to RUNOFF is managing to get headers and footnotes as well as the text on the page in a manner Page layout as envisioned complicated and is only corrmercial phototypsetting by to pl~asing printers recently being concerns. is a page One does much more by illustrative a photograph and asking a typesetting program to "flow" the text around it, say in a double-column format. RUNOFF eye. attempted problem is that of specifying a position for on the not But while attempt such a task, a discussion of its pp. 219-250. 11 concerns prepares the student to deal with many of the concerns of real typesetting. Computer Driven Phototypesetting The major text for the Modern typesetting by Reports" fame). material from E!9.!!i~IE~!!..!. Seybold Fundamentals (of Concepts in this text are Electronic Market 7 by published John is In the of "Seybold augmented with and addition, Information progranming International, manuals Inc. are referenced when dealing with typesetting programs. The sequence of the Seybold book is followed closely. The first six chapters deal with the history of printing as well as the various characteristics of These chapters are important in various ways: this facet of text processing aware of printing, the complications whether mechanically. The within the field. the relations it enn.: is main student inherent created in "type". in studying must be made dealing with electro-optically or complication is non-standardness are relative (and are not necessarily transitive.) Aside from Seybold 7 Electronic PF~g d~~~~!~l~ ~Tf MQ~~Tr~ g~~QQ~i!iQ~ and 1 uu ICaLIOns, Egui~ment-JmarKef--(New--york: rgvvy~--- the Almost all measures -(lV_I_d-~---P6 J~h~-Seybold, e 1a, somewhat ------ nc., v77., Electro-optical Publishing ___ Frosr--ana-Sullivan;-rnc~, 12 exposure to inherent from a survey complications, of the methods history of typesetting. character on a student employed profits throughout the For instance, the idea of a "meta" is introduced in the 1930's for shifting 'rails' paper-tape-driven 1880's), the the linecaster. inventers of Even earlier (the Monotype developed the system used even today in measuring characters as values relative to the widest character in the font. Therefore, the first nine chapters assigned and discussed. current typesetters. the logic the book are This leads the discussion up to The other chapters covered include of hyphenation and the specification and actual programming of composition languages. some of As mentioned above, material on the latter topics are derived from actual composition 1 anguage s of twar e at I I I. Office Automation The relationship of composition to word processing and office automation is explored, followed by a proper study of office automation and these electronic mail. Material for topics is drawn not only from Seybold, but also from reports from various researchers and Stanford Research detail the progression Institute beginning from the of and companies Datapro. electronic mail as Xerox, These reports as a logical network technology developed in the form of the ALOHANET and the ARPANET. The Realities In this second part of the paper the main notes to the class are detailed along with a discussion of the actual progress of the class. It is here that the discovery is made that concepts clear to one with prior understanding or experience are not intimation alone. made familiar to the novice by I hope that statement becomes clear as we progress ••. 13 TECO The introduction to TECO was example. a accomplished ~!~~~~ Such conventions as how to use the delimeter and invoker) command and representation (a dollar-sign) are given. visual Four main genres Pointer manipulation, Deletion and Search. Type-out and An example page of FORTRAN code used for demonstrating these xerographed by key (as its Insertion, of command type are then given: mainly commands was and distributed to the class as well as a copy of the TECO pocket guide (a mini reference manual). The lecture content up to this point did not challenge the class, simple commands. a It was after the introduction of TECO "programning" to as it appeared to be an editor with language some that interest demonstrated in the form of probing questions to seem as was (as opposed "Is it true that this effects .•. ? 11 questions). TECO as a text processing language is seen. Given enough the insight powerful I have and/or patience, almost any (N~te task can be accomplished. most that I didn't say g~l~~lY --TECO is an interpreter.) For instance, TECO will return a flag as to whether a file exists on the disc and be persuaded to throughly However, it would have taken teach language than was time for. challenging powerful all the much capabilities of the Therefore, with a simple but assignment in mind, I covered many of the more constructs execution). can to use that file as a data file or as a TECO program to be executed. longer then (including macro The pith of this follows. 14 generation and 15 TECO is a character-oriented editor and the II . signifies II is a place integers of and the current cursor address. storage and It text. can may store act character A "Q-register" two data types: as holding either, but When cannot be thought of as holding both simultaneously. used for storing an integer value, it is loaded with that value by preceding the letter "U" with an evaluates to For integer. an expression .U1 instance, that loads Q-register "1" with the address of the cursor. Q-register names are any digit or letter of the alphabet. In the same manner OU1 zeros Q-register 1. use with Q-registers are Other numeric operations of %and=. The percent operator increments the following expression while = types the value of an expression out to the Thus %A would terminal. increment the contents of Q-register "A" and QA= would type out the value of Q-register "A". value within Q-register "A". text. One easy A simple "QA" returns the Q-registers nX*, where n is the (beginning with the current operator, is and "*" a be stored the text hold and away and number of lines to be stored cursor position) Q-register name. retrieve this text, position the cursor at want also way to load one with text is to position the cursor "in front of" the text to type can X is the In order to the point you type G* where * is a Q-register name. The contents of the Q-register will be inserted between the two characters addressed address may be thought of by as the "how cursor many (the cursor's characters that precede the cursor in the buffer", and the cursor is always "in between" characters). If a G command is issued and the 16 buffer an has integer stored in it, an error message is generated. Another expression. This returns the equivalent ASCII code (in the radix, prevailing character in the item of interest is decimal) of the the "nA" default is .+n+1th buffer. For· example, a "0A'' return the character code of the character immediately following the cursor, value the while a "-1A" will return the character immediately preceding the cursor. two ways way, is to create somewhat algorithmic equivalent language. "command loop". forms. "macros" in TECO. The to a of Now, there are A "macro", by the procedure in an The first appears in the form of a This, in turn, is usually employed in first is easier: two n(command string). This syntax (including the braces) causes the string of commands within the braces to be executed "n" times. Unfortunately, braces won't print on the printer used for this parentheses> of this strings are is: are string1 is argument used in their stead. commands with and The hopefully generates After string2 executed, control is passed to string1 again. the numeric argument evalutes to to as a "conditional semi-colon passed to string3. command a numeric The semicolon causes control to pass to string2 if n negative. is three the following interpretation: the semi-colon. control a has been In the event non-negative integer, This loop is then referred loop" (by virtue of the evaluation.) Now, if a search (for a string, of course) fails while in a command loop, it generates a for so A more general form (string1 n; string2)string3. executed to paper, use by a semicolon that may follow it; zero or a -1 if the 17 search So succeeds. occurrance of the <sFOO$; Ott> string type out the line on which command loop wi 11 search "FOO" If it finds one, it will it is found, and begin if not, it exits the loop. again; an for the Command loops may be nested. The other form of macro looks more like a program: !TAG! anything between exclams is a label or comment Otag$ unconditional go-to; transfers control to label !tag! exp"#string1'string2 Expression exp is evaluated. # is an L, E, N, G. So if # is L, strin~1 is executed, then string2 1f ex is lessthan zero. -- -- -- ----E-=-equaT to zero, N = non zero G = greater than zero. If condition not met, control is passed immediately to ' ~!~~E.!.~ Macro Body Running Comments !START! -10u1 !LOOP!c %1 We start the macro here. Load Q-reg "1" with a -10. "c" moves the cursor up one character and the %1 increments the contents of Q-reg "1". The contents of Q-reg "1" are used in the " evaluation. Spaces and carriage-returns are ignored in the macro body. Q1"L OLOOP$' !DONE! If the content of Q-reg "1" is less-than zero, control returns to label !LOOP! command, otherwise "Oloop$ command (when is via the contents skipped !OONE! (a comment, but can be simple macro has the effect characters to the "right". and the of The same II Qll are unconditional jump .GE. zero) the control passes to label next moving macro command). This the cursor ten can be coded 18 using a conmand loop: -10U1(0161Q1;)$. Another useful variable is "Z". the "Z" always contains address of the last character in the buffer. In other words, it is the number of characters in the buffer. useful constants are the ascii values of carriage-return and linefeed, decimal 13 and variable can be Some 10, useful in respectively. terminating The "Z" macros. For instance, (C.-Z;) will move the cursor "to the right" until it is moved past the last character in the buffer. point, the loop is exited. The same may !LOOP! .-Z"E OFINISH$'COLOOP$!FINISH!. be At that coded as: The class also was told (in the first lecture) than the "L" conmand would move the cursor to the beginning of the next line. i s on the l as t l i n e , t he " L" c onma n d wi l l If the user p o s i t i on t he cursor past this line. A macro may be executed by typing its body followed by two alt-modes. prudent) to Since develop it is programs usually with easier and (and more editor, then "execute" it, another method for executing macros is given. As was mentioned Q-register above, with text. it is possible to develop a a A Q-register may then be "executed" by typing Mq where q is a Q-register name. may "load" Therefore, we macro in TECO's editing buffer then load a Q-register with the macro body for later execution. Macros t hen l o ad a be c orne , Q-register I!LOOP! .-z e f f ec t i ve l y , with the sub r o u t i n e s . above EOFINISH$'COLOOP$ cannot insert alt-modes using macro, !FINISH!$$. the insert So , we to just type Not quite. command We since 19 alt-mode is the delimeter. to type 271. value One way to insert an altmode is This is a TECO command that inserts the ASCII preceding the "I" into the buffer, and "27" is the decimal ASCII value of an altmode. Given this knowledge, the class create a was assigned to text file of ten lines or so using TECO, then 2) to write a TECO macro to count the number of lines editing 1) They buffer. in the could write any kind of macro they wished, so long as it worked. I instructed them to get the macro into the editing buffer in text form, and to have the file of text (to be counted) available on further nausea on text To save their part, I gave them an incantation that would load their macro into a the disk. Q-register, file into the editing buffer. on the hard copy to the terminal then then load They were to turn type the following commands: HT$$ HXA$ HK$ ERFOO$ Y$ HT$ MA$ !Type out macro body! !Load Q-reg A with macro! !Clear buffer! !Find file FOO on dsk! !Load file into buffer! !Type contents of file! !Execute macro in Q-reg "A"! Their macro was to type out value lines in the buffer. of the One acceptable solution would be Ou1 !zero out Q-reg 1 for counter $ ;%1) Q1=$$ !search for a literal carriage-return! !by typing a carriage-return in the ! !search string. Increment Q-re~ 1 if ! !found, exit loop if search fails <s number of 20 Other solutions could involve 1) buffer examining each Q-register on seeing a character moving and carriage-return through the incrementing the or on seeing a line-feed (but not incrementing on both), 2) moving through the buffer with the versions need would "L" to conmand. detect testing the cursor against "Z". Both the of the latter bottom of buffer by r · The Editor Assignment The TECO assignment acquainted the class with some the options editor. In that must be explaining of considered when specifying an characteristics the this of editor/language alternative schemes of specifying TECO were considered. locks it The into fact a that certain it is character-addressable cast already. unique to other three other editors One, Characteristics were then discussed. which I wrote, is a display-oriented character editor The (a list of its commands appear in Appendix A). two are essentially line-editors. One of the latter was the SOS mentioned in the Objectives, the written other other an editor for use under the UNIX system and is described in detail in Software Tools. 1 The class was made aware the first day that they to write class. an editor as part of the requirements for the They were to hand in an external the editor they would implement. certain minimum preferably, in editing PASCAL. teach the The reason class manipulating text. were written in how specification capabilities, The of The editor was to provide and requirement PASCAL had its rational in staying SNOBOL. were clear written, to write it in of FORTRAN and is the writing of the editor was to to build and organize This included searching. SNOBOL, the task of tools for If the editor implementing a searching routine and therefore understanding how searching --------IK~~~i~han and Plauger, QQ~ £l!~' pp. 163-217 •. 21 22 is effected, would not be accomplished as the routines part of the kernel. FORTRAN was foregone for reasons I hope are obvious to the reader. for data structuring. to have a common Thus bugs in are PASCAL has more affinity SIMULA was foregone because I wanted basis for discussing coding problems. one compiler are worth discussing in class; allowing two languages would, in my opinion, turn the class into a programming languages class. In covering the other editors, I emphasized that convention or feature the implementor allows in the editor molds the other features as yet unspecified. if a line editor is For instance, contemplated, the implementor must anticipate the need to modify characters on a He is faced with the decision of given of those characters line. forcing the user to completely retype the line or of allowing the only each modification requiring modification. If he chooses to offer the latter capability, specification implementation be consistent with of that command must and commands already specified. During these discussions the class in exhibited concern that they had no idea as how to specify an editor: first milestone assignment in the exemplified in the project. So the questions Objectives (concerning editor design) were posed academically. This discourse was interrupted by a deep anxiety on the class's part about writing the editor in PASCAL; of the only two people in the class admitted knowledge language. I decided that I really wanted a common basis of PASCAL in the project, so I agreed to deliver a 23 few lectures by way of introducing it. 24 An Aside On PASCAL In order to spare the class the expense of buying many books, a copy of the library. ~~[!~~~~ !~~l~ was put on reserve in I then strongly urged them to buy a ~~~g~~i~g i£ ~~~£~ll· Grogono's a clear treatment valuable routines. our (CDC) of part, Dispose implementation routine then routines that manage structures). In it in was unsatisfactory. Grogono provides a routine to test the Dispose in implementation, of the language and examples of For instance, the particular copy This book was augmented with examples of my own design, but for the most gave too a given offers replacement Create and Return unique data types (usually record addition to the Grogono text, I informed the class that there is a documentation file on NOS that outlines the peculiarities of the NOS PASCAL. PASCAL was introduced to the class by quickly naming the "gross" or categorical features of the language. I.e., that the complete syntax of the language can be represented on two pages using Wirth's diagrams as well as the following: 1. All variables must be declared 2. Keywords (reserved words) are the diagrams 3. The semicolon is a statement separator --------lP~t~;- Grogono, Programming Mass.: capitalized in in PASCAL (Reading, Addison-Wesley PuoTTsliTng-Co. ,-Tnc:--;-r978). 25 4. There are four built-in data types a) real b) integer c) boolean d) char(acter) 5. There are also four built-in data structures a) array b) record c) set d) file (sequential in standard PASCAL) 6. There is facility for user-defined data types. Example: Type iodevice = (diskdevice, printdevice, readdevice, drumdevice) The four scalars are the only values the type iodevice can assume. 7. There is dynamic stora~e allocation (but not for arrays unless defined as a component of a record). 8. Associated variables. 9. Labels are declared. with records unsigned are integers typed and pointer must be 10. There is a CASE statement (but has no "others" label). 11. The FOR FOR The 12. Procedures recursive. 13. Can call by reference or by value. 14. All blocks are named. 15. The boolean operators are AND, OR, and NOT. FOR statement can take one of two forms: id := exp1 TO exp2 DO statement id := exp1 DOWNTO exp2 DO statement implica,tion is the "step" is always "1". and Functions are automatically The remainder of the PASCAL lecturing was divided into two genres: providing in-depth examples of the above features and pointing up caveats involved in using the PASCAL. Most of the involved familiarizing time the in class NOS covering the first genre with the record type definition facility and set definition and manipulation. I 26 began the record definition material setting up doubly-linked lists: The data part of the node was This was first to opportunity progrrumming level of the class. TECO an example of one record type defined a node. the with be a character. I had to guage the They had performed the assignment quite well and I am not sure as to whether it was due to cooperation among members of the class or hints my in delivering the TECO material and assignment. in dealing with the list asked me to explain example, what a a number of students linked-list was. This, of course, led to a short treatement of linked-lists. PASCAL handles file I/0 was thought the class needed some PASCAL. So, in concert But The way then covered in detail. sort with of the spring-board lecture, I into I wrote an example program using a PASCAL implemented in Hamburg, West Germany that runs on a DECSYSTEM-10. The program appears in Appendix B. The program was examined in class. The "program declaration" heading differs from CDC's, and was pointed up in class. Before characteristics any assignment unique to text files were discussed. with I/0 in CDC's was made, covered implementation. It seems NOS interferes the following way: First, a bit any line is stored an an integral number of words at ten characters per an we word; and end-of-line condition is represented as 12 bits of zero (two six-bit quantities). positions Now for the catch: character between the last non-blank character on the line and the 12 bits of zero that represent the end-of-line are 27 padded out with blanks. means the user can expect before detecting the Under up to certain conditions this eight end-of-line! blank characters Second, the string delimeter characters in the CDC implementation are #'s, not apostrophes. Third, variable is not an changes are not the character for defining a pointer up-arrow, out of an apostrophe. There are only 64 Appendix C. it. The character set is given in Fourth, as mentioned above, the CASE statement has no "others" clause, and the compiler is that characters under the NOS timesharing subsystem-- and there is no up-arrow in in These capriciousness, but result from CDC's limited character set. available but the does not match not forgiving program will abort if one of the CASE labels the selector. While this seems to be "standard" PASCAL,· it is prudent to note that most, if not all, statements CASE should be enclosed in an "IF" statement for safety. The thought was raised in class that there is another (simpler?) way to read a string of characters into a linked list. One appends each character to the beginning list of the (thus not requiring a pointer to the last node of the list). When the string is completely read in, one simply reverses the list. The completing class an was offered its choice a another list. methods in assignment to read the entire contents of a file into memory in the form of linked represents of line lists. Each list and the lines are to be connected with Thus we have a linked list of text lines f ' 28 where each line characters. is represented Then the "buffer" is by a linked to be written list of to an output file. The motivation for making twofold: the above assignment was to familiarize the class with the NOS PASCAL and to give them a kernel from which to develop the rest of the editor. In completing the assignment, the class had its input and exit routines. By this time it was becomming clear might be implemented representation. discussed I The possibility but consider the editor linked-lists for buffer of using an array Q-register-type store text) a facility (somewhere when to the conceived as lists than as one long array that is not dynamic. Thus a Q-register is lists rather than managing free "core". loaded by of getting "Disposing" it. available memory. copying With this in mind, I tested the NOS PASCAL resident Dispose routine. consisted to basic component of an editor. The task of memory management seemed less imposing class was not encouraged much on the following basis. a temporarily using that The test an instance of a PASCAL record then The program aborted due to exhaustion of This is clearly not the action one would desire. Therefore, I mentioned to the class that we should define a separate New and Dispose (under identifiers) for each record type that is used by the program; different dynamically because, among other reasons, the pointer variables receiving pointers to these records are typed. The assignment strongly was trivial as the procedures are 29 given, as mentioned above, in the Grogono text. Perhaps the last major topic covered under PASCAL definition and use of sets. We decided it would be useful t o emp 1 o y t he s e t type i n i den ti f y i n g char a c t e r s from the terminal. well as "alpha". allow the was r e t r i e v ed Thus a set "digit" could be defined, as Unfortunately the declaration "TYPE CH results from the convention of strings on 60-bit words. NOS PASCAL = SET OF does not CHAR". implementing sets This as bit Only one word is used and, sadly, there are 64 characters in the timesharing character set. Therefore, as "illumination" to a lecture on sets and their manipulation, I wrote and ·handed-out a program that defines and employs sets; it appears in Appendix D. In discussing features of PASCAL with the class, I was able to fit examples to the discussion at hand, that is, text editors. specifying So when returned our discussion them the but not of specifying one. and they unanamously chose the latter. appears to have been the better choice. class user" masse In retrospect, this I was able to turn into more of a seminar than a lecture (at least for awhile) by continually asking what may be naive I choice of specifying the editor each of them would eventually implement to specifying one en the to an editor the class had gotten over some of its fears of writing one -offered we questions. termed "the For instance, the user would like to see, from time to time, the current state of all or.part of his editing buffer. The class considered from the designer's vantage the implications of allowing the user to 30 specify "type-out" in absolute and/or relative commands. The same question appeared in lines. When we tried terms of the deletion for a plurality in specifying the action of the insert command (and the form it should the class became embroiled "controversy". As continually think) (I the reminded I them that each syntactic the as-yet-to-be-specified syntax; and that definitions begged the definition of others. commands was decided that the user only. 1 ines? of progressed, that the user must be spared the complication of non-symmetric lines take) in what can only be termed a specification construct implied the form commands; of So could insert some For example, it or delete whole how was the user to "merge" two adjacent The "merge" command was thus born: "deleted" of it essentially the conceptual end-of-line of the first line and made the following line part of the first. The class responded immediately that we needed to specify an opposing command; one that could split user-specified point one a line command into two that I at a had not command per anticipated. The subject of allowing user-input line restrictions. statement more than one was rejected as a requirement due to time This allowed the class to use CASE for a control structure to the editor (by having a set of legal commands, then dispatching within statement). that the case Also, a method for saving and retrieving text in Q-registers was discussed. routine a replicated the One simply defined a copy line and character nodes of 31 desired lines (for saving) and the same for retrieving (new nodes are obtained, not pointers to data in the buffer). Later in the semester I handed reiterated the longer to create it. than assignments. "Q" document a few As and the RUNOFF source that It file One student requested that I give them weeks the only editor, I assigned all the "Change", a consensus on the editor specification. appears in Appendix E along with used out to complete assignment capabilities progrrumming pending except was the "Search", "G" to be completed 4 weeks before the final, with the balance due a week before the final. Pattern Matching The editor discussion was ended In searching ,techniques. simple searching Sahn i. However, by a discussion particular, I described the presented by Horrowitz and algorithm I had of meant to allow this algorithm as sufficient for our editor and therefore decided code it and test it in PASCAL. I thought the rest of the editor would be challenging enough (for one semester's work). I coded it exactly the way I found it in the book and it terminated with an "invalid pointer reference". NIL pointer value. Upon This usually means investigation, a I discovered I could not use the following construct: WHILE (NOT(P=NIL) AND NOT(Q=NIL) AND (P'.DATA=Q'.DATA)) DO BEGIN P:=P'.NEXT; Q:=Q'.NEXT END; It seems that PASCAL evaluates even if P=NIL or Q=NILL. the whole predicate, Thus the (P'.DATA=Q'.DATA) will eventually always attempt to utilize a NIL pointer. I left that particular code within a comment in my program FIND in Appendix F for illustration (and in case the above is vague). I found that I had to write some "ineligant" code in order to get around this shortcomming. seen context As may be upon inspection, I attempted to remain as faithful as possible to the original algorithm, but my introducing another variable OLDP. the value of P (which points into the this necessitated This is used to save target string) in anticipation of P being set to nil in the event the pattern and target strings are of pointer would be lost. equal length. Otherwise, the Other "ineligance" is introduced in order to ensure termination of the first loop. 32 33 As mentioned above in passing, this procedure is given in its entirety in Appendix F. pattern and the target algorithm (for string. It assumes two lists: I decided to Appendix G and is called FINDA. This appears This algorithm differs in one important way from FIND (because the pattern an array}. is in We find that yet another piece of information must be supplied-- the length of the pattern. becomes the illustrative purposes) so as to assume the pattern to be in an array rather than a list. in code the obvious The reason when we consider that with such a limited character set, there is no convenient character to use as a delimiter. We can't use a zero or any other non-character data-type as the array is defined as consisting of elements of type "CHAR", and character in the pattern. array PASCAL may enforces be this. considered part Thus, any of the Therefore, actual length of the string is passed as a parameter to FINDA. In other respects FINDA is In ~~~Q~~~~!~l! of Q~!~ ~!I~~!~I~!' algorithm FIND is similar to FIND (and in fact, simpler}. followed by another algorithm NFIND which exhibits better characteristics (in terms of number of characters to achieve a match}. examined As mentioned in the Objectives the algorithm compares the corresponding last character in strings before starting the actual loop. the Of course, this algorithm can exhibit behavior as bad as FIND, and this was mentioned to the class. 34 The TECO emphasis on algorithm the was preprocessing achieve generality and speed. pattern string characters. discussed of next, the with an data in order to The user is able to build that allows specification of special match In the following table a control character as represented two characters, followed a example, the match a control "X" is represented as 'X. characters consist of a control is by the character to be struck while holding down the control For a key. Some of character followed by another character, a few follow: Function accept any-cnaracfer in this position accept any separater. (punctuation, tabs and spaces) accept any alpha (of the 52 upper+lower) accept any digit accept any non-null string of spaces accept any of the characters specified accept any char except the following Character Tx------'s 'EA 'ED 'ES 'E chl, •• ,chn 'N The control-N needs explanation. modifier to a match specification. search string were meaning ~~1 'N'EA (control-N, It is used as a For instance, if the control-E, A) the is "aceept any character in this position if it is an alpha character. staged. In the The algorithm is conceptually first stage the pattern "compiled"; in the second, the pattern array is determining matches. two-dimensional. position The array may be The columns correspond to array used thought the two is for of as character in the pattern, while the rows determine matching status. The depth of a column is set to 128 (corresponding to cardinality of the character set). the In this stage, the array is set to zero then the first character is "set". A position string-specifying character is read and the 35 appropriate cells of that column are exclusive-or'd with 'true'. In non-case-sensitive mode, say an 'a' is read. The ASCII value of the character is used as an the a column for that character position. index into The program notes that the character was alphabetic and so subtracts an octal 40 from its value, thus making it upper-case. is also used elements as an index into column This value one. Thus of the column have been set.to true. two The program then increments the column index and the second position in the pattern string is set. For an example of a special match-control character (from the table above), let's say a control-X is read. The control-X, as may be remembered, specifies that any character found in this position of search string is accepted as a match. in column two operator. are set true Thus, all the cells using the exclusive-OR if the control-D, we would have set only values the ASCI I values of zero through nine. Obviously, to the corresponded to character the cells been whose a row Using this mechanism, the control-N operator becomes trivial. Upon reading a control-N in setting pattern string, all the cells in the column the had inclusive-OR operator. are up the set with After setting these cells, the column number is not advanced. Therefore, when any other arbitrarily complex specifier is encountered, the action of that specifier is to turn off those bits with well. effect with an exclusive-OR. it is concerned Our first example would serve If the 'a' had been read following a control-N, would have been to set to the false the two cells corresponding to upper and lower case 'a'. 36 The second stage of The above. algorithm the algorithm attempts to match "string" with characters in the buffer. found, the resembles If the no "FIND" pattern match is buffer pointer is moved only one character and ,q the match is tried again. a very few However, this is carried out instructions as the algorithm is written in a powerful assembly language. found to A character in the the buffer The instruction accumulator as an index register into the row in the pattern table corresponding to the ASCII value character in the accumulator. of the buffer if the cell is non-zero (i.e., the characters matched). the characters did not match, the process is repeated character the It skips if the cell is zero and returns to pick up the next character from the is match by loading it into an accumulator then the determination is made in one instruction. uses in If with following the first character in the target string as the new beginning of the target string. Unfortunately, after there was insufficient the time TECO to search cover algorithms mentioned in the Objectives. formatting had come. was covered, any of the other The time for Text , Formatting Two format programs are discussed: Tools and the other, RUNOFF. modeled on RUNOFF, but is written system. one from Software Actually, the former was in RATFOR for a UNIX Most of our discussions concerned the former, but RUNOFF was appealed to for justification. by Kernighan and purpose of a formatter is to neatly format a document that is pleasing to the eye 1 • The " p 1e as i n g t o t he eye" i s mi n e and imp 1 i e s a mi n i rna 1 de g r e e of page layout. Plauger, As mentioned the The commands implemented in this formatter are characters in length that always follow a period. Some representative commands follow: Command Function toiiTii-T5 "fill" mode turns off "fill" mode force out current output line and start new line produces a blank line sets spacing on the page "begin page" -- skips to top of new page, numbers it "n" center the next "n" lines underline the next "n" lines sets left margin sets right margin do temporary indent of "n" spaces -:-rr--.nf .br .sp .ls n .bp n .ce . ul . in . rm .ti n n n n n Most of the commands are setting of margins, etc.). specifying header and head" for "status setting" {the There is also a provision for footer lines {meaning a "running {foot) that is put at the top (bottom) of each page. The treatment of this genre is quite good, in that the text is exemplifying therefore shows prograrrming the by top-down methodology. development 37 of some of the It more 38 important routines through up four generations. that i t detects whether we are at the top of past puts out the current line; the footer then a "buffer" is is This invoked. typesetting programs, as justifier is one or made to see One level down, we more similar will If so, word the structure be into the justification is later similar discussed. to The to typesetting programs in that all the "spacing" is done before any of That page responsible for detecting whether the word will overflow the measure. routine is test should be output. find that the routine that puts output the bottom (in which case the header is printed) and the whether The code is encapsulated in the output routine in layout page to the line is output. is to say, the output buffer is an array within which the text is shuffled until it is justified "in afterwards core", and the contents of the array are output (up to the "linewidth"). Another useful concept pointed up is that of distinguishing between the number of printing characters in the output buffer as opposed to the number The non-printing characters be typesetting characters. backspaces underlining) or later, characters controlling the typesetter. for in may of programs, (for "meta" Thus the user should be sure the total number of characters in his output buffer does not overflow the buffer length. The concept of formatting programs. called another by formatter. Its the "break" is central to text A break may be issued explicitly or of the routines that comprise the action is to force the output of the line 39 in the output discovering a justification. and in without "begin page" or "space n lines" command, a call to the break routine is made to text Thus, buffer finish the previous insure the text following appears in the correct place on the document. It was mentioned accepts above the underline command a numeric argument (corresponding to the number of lines to be underlined). just that a Also, if we wanted to underline few words of a line, we would put these words on a separate line. Since the "underline" command does not cause a break, the words on that separate input line appear on the same line implementation is as the text encompassing straightforward: it. Its immediately after the underline command is parsed, the next line is read into the input buffer. Each character of this buffer is copied into a temporary buffer -- but with a character word. the following each backspace character of and underline the input buffer Then the whole temporary buffer is copied back input buffer and the words into are sent to the output buffer by the normal operation of the program. The centering routine is even simpler: a variable "tival" it simply sets that is sensed by the output routine. Tival is mnemonic for "temporary indent", and routine of spaces before simply outputs outputting the line. "tival" number the output 40 Probably the main drawback I found with the algorithm for justification is that it is tricky and Tools elaborate (the word "tricky" None the less, is their own description). the algorithm was discussed. afterward, however, I presented the algorithm uses, which former in that "justifying" spaces to the output line as £~!~~!(i.e., line Immediately that RUNOFF seems, to me at least, more accessible. algorithm differs from the the Software it That adds the the!!.!!.~!.~ ~~!.!!.K On sent to the output buffer). discovering to be full, RUNOFF calculates two values "EXSPl" and "EXSP2". The first is the number of spaces to be added ~~~£~Q!.!l£~~!!Y to each interword space in the line, while the latter's interpretation is contingent on whether we are weighting the "white space" to the left or right of the line. That is, the number of extra spaces is added to the right or left of the output lines in order to avoid "rivers" -- gaps of white space on one page or other. So, side add a words. Thus If or "real" space extra spaces to the right. we are a of space on the the biasing Therefore an extra space will be added between words 7 and 8, 8 and 9 and 9 and 10. diagram to we need to add three spaces to the line to make it justify, exsp2 will be equal to six if the space if we have a line with ten words on it, there are 9 interword gaps 1 i ne. the output space, otherwise it is a count of the number of interword gaps to pass up before adding an between of if we are biasing the spaces to the left, EXSP2 is the number of times we "real" alternately A flow algorithm that was presented to the class appears in Appendix H. 41 At this point the class was given a midterm. of The text both the midterm and the final appear in Appendix I. I considered it a somewhat easy test, but then again, I wrote it. The results were a bit surprising to me in that I thought people were absorbing the material better than demonstrated. There were excellent scores: high 80 and 90 per cent range. The scores were 97% and 28% respectively. highest was 5 in the very and lowest Typesetting 1 This section typesetting methods. were concerns the of business -- by both "early machine" and phototypesetter The corresponding chapters covered 1-9, teaching 12-15 and in the book The content of other chapters 21. (that deal with word processing) are dealt with using other sources. The area difference is introduced between with typewriters a discussion and typesetters. Executive is an example of "cold type" in that no cast. If any plates are made, it is a of the The IBM lead is result of photographing a typewritten page produced by the Executive, then making offset plates from that. The Executive also introduces the student to the idea of proportional spacing. This the is case in which proportionately different for a an "1". the C~?ital machine "M", say, than for The concern here is for the appearance of the text type is traditionally designed so that are "escapes" naturally wider than others some characters and the Executive is a compromise between a typewritter and a typesetter. Since this was an introduction phase, its use is also demonstrated (verbally). device for locking rows of type close the chase and The chase is a together for the purpose of either making plates or printing directly onto a 42 43 ,, In such a device, all the lines are required page. of the same length. Therefore, added. are From this came the very right edge of the chase. (called lead concept the justifying lines, where the last word of the at be a line is not wide if enough to fit in the chase tighty, pieces of quads) to line of appears This is accomplished by adding pieces of metal between the words (and sometimes between the letters), and is called "inter word" and "inter letter" The spacing. comparison between introduction the includes linecasting short a machines and the Monotype, but I will postphone such a discussion until they are treated at length. Probably the most obvious difference between typesetters and typewriters is that of character repetoire. For example, a typewriter has only a hyphen it character and must fulfil the role of the n-dash (used for separating say, dates: Another 1888-1960) and them-dash--used for difference is that of achieving justification more smoothly than a mono-spaced font device emphasis. can. On a mono-spaced like a typewriter, justification cannot be achieved by distributing small increments of space among each of the inter-word spaces increment printing are is attributes: the ease the because same concerned the 1 with smallest unit. Thus word largest traditional the aesthetics and two other readability and legibilityi. Readability is of reading a printed page (the type arrangement) and legibility refers to the speed with which or and can be recognized (the type each letter design). And of 44 course, a printer also has bold and italic fonts to draw upon in order to draw attention to specific phrases. For purposes of discussion, it should there are two mortise. to noted that 72 "points" to the inch and one "pica" is equal to 12 points. type, be In detailing some of the characteristics terms come to mind: the ability to kern and Kerning is the process of causing two "overhang" one another. characters For instance, if we wanted the cross of a capital "Q" to underline the next character, could the cross Mortising is so overhangs the body of lead it is cast on. Thus any character that follows will fit under another aesthetic nicety the cross. wherein characters that normally would be spaced too far they we put both of the characters on the same piece of lead (and thus creating a ligature) or we could cast the "Q" that of two apart if were to "sit" on their respective bodies are "pulled" together by physically sawing off parts of bodies. This each of their could be desired in the case of a capital V followed by a capital A. -------------***-----------------------------------*** t--- ASCENDER *** -------------***-------------------------***-------*** ***** *** ***** **** **** * **** * * * X-HEIGHT --:t *** * * *** * * *** * * * *** * *** * * *** * * **** * * * **** * *** **** **** *** **** base line----***-------------------------***-------*** DESCENDER -----------~ *** *** -----------------------------------------***-------- 45 Referring to the above figure, the point size of is generally given as the height of the body, where the body is the distance from the top of the ascenders type type of to the the bottom of the descender of the type. one font Note that a lower ~ase "l" may be larger lower "l" of another font of the same "point size". case in the than This may occur because the lower case l "sits" on the a base line -- and the base line in one font may be in a different relative position than position the base line of another. of the base line is determined by the x-height of the font, where the x-height is the height of a x (it has no ascenders x-height of a font, the or descenders). shorter its descender capital base-line also; so some, in turn, are larger in letters are another of the same point-size. lower-case The larger the Unfortunately, than The can be. on the dependent one font This in turn creates _, the problem of getting characters of various fonts to "base aiign", meaning they all sit on the same imaginary line. This brings up the topic of leading. distance between "solid setting". space added the base lines spaces between rows of type. out is the running text, in a A solid setting implies there is no extra consecutive lines of text. ·leading derives from the days in were of Leading by which The term consecutive lines inserting strips of lead between the It is used to make a page more legible; the useage is "11 on 13" where "11" refers to the point-size of the type and "13" refers to base-lines. the actual distance between This implies there is "two points of leading" 46 between the lines. Another handy term of measurement is the idea "em" space. fact the It is the amount of horizontal space that is equal to the point size one is setting in. the of It derives from that the capital M was traditionally square, and that it is the widest character of the font (normally). "en" space space. is either 1/2 an "em" space or 1/3 an "em" The "em" concept is very handy in specifies a relative width. that the em-space (to always be defined is the widest character in the font, and all the other characters are defined in em-space. it When using typesetters that use a relative-width system of escapement later) An terms of the Thus if there are 18 relative units, an em-space is 18 units wide, while an en-space would be 9 units (if it is defined as 1/2 em). If that were not "set" should help confusing us along. enough, the concept of It has two definitions. In terms of the design of the original type, case when it defines the the width of the em-space has been narrpowed or widened (the height remains the same). Thus if a character is 6pt/8set it means it is 6 points high and 8 points wide. 8pt/6set implies the opposite. types e t t e r , subtract achieve width from a of it is the 2) In the context of a t he amo u n t of " s i de bear i n g" t o add o r character's normal width. Thus we 6pt/8set appearance by adding two points to the each character that is set. With photocomposition device, the amount can be negative. a - - - ---- -- - - -- - -....-- ----- ---~------~--- ------~---------- I First Generation Typesetters Seybold concerns himself with two genres of typesetter in t hi s c ·a t ego r y : typesetters. 1 i n e cas t e r s and Mono type-s t y 1 e The Linotype is covered first. The Linotype is still used in the United States, and has was the most popular style of typesetter throughout the first half of the Twentieth Century. hand-composition are significant. Its advantages over First, a printer can set over 6000 characters per hour on the Linotype as opposed to 2000 characters per hour by hand. Second, the matricies (type bodies) are automatically recirculated back into machine's matrix magazine (this implies sorting them into the proper positions). composes slug. be a line at The Linotype is so nruned because it a time, the lead "line" is called a The slugs are "recirculatable" also in that they can re-melted for reuse. Third, justification is simple: when the operator strikes the space bar, a "spacing" is dropped into the setting channel. be wedge When the line becomes within "justification range" (the point at which may the the line justified aesthetically), the operator may command the machine to "set" the line. At this point the machine pushes the spacing wedges causing them to become wider (the wedge consists of two "wedges" that, each when other, form two parallel planes). thus force the words apart with an equal between against The spacing wedges amount of space adjacent words, until the line is spaced apart the width of the measure. injects placed Immediately afterward, the machine lead on top of the matrices to create a lead block 47 • 48 of type. When a hand-compositor tries to achieve the effect, he must do it by experimenting with same various interword spaces until the correct width is achieved. last advantage one of order: (and are of the linecaster over hand-composition is blocks of type are easier to keep all the same width) individual pieces of type. "lost" The than in order lines comprised of The individual pieces are also for the period of time they are used for the actual printing. The Linotype has four divisions. magazine character Each of the matrices is the two contains a standard and an ltallic of that character (usually). magazines are interchangeable and faces. first that contain the matrices (a magazine corresponds to a font). have The thus the The operator may magazines -- one for roman and one for bold type Then thate is the keyboard which releases into "channel" one matrix per corresponding the keystroke. As mentioned above, there is a casting mechanism which injects the lead, and finally recirculates each utilizing special a the distribution mechanism. used matrix sort into the magazine the originates: terminology "upper must and occurs, of the and "lower ~apital Linotype is case" letters! is if a be made on a line, the whole line must be recast and possibly the widow case" the upper case contained the One of the disadvantages correction by key on the top of each matrix. Some linecasters may be fitted with two magazines, where This the surrounding whole page is lines. sometimes Also, if a reset to a 49 Obviously, pages are "pulled" throughout tighter set size. the operation analyze to for errors. With composition software for second- and third-generation typesetters, the detection of a widow is detected before any copy is pulled. Other more obvious drawbacks are that matrices may be mixed-up (they are sometimes dropped into the assembly area of the machine out of order-- a "feature" of the machine.) The Monotype was introduced a year after the (1887). This Linotype. It is divided into two main keyboard and machine the differs caster. Linotype substantially The from constituents: blowing air against the tape. In the e~ent through a hole, it blows against a small piston, turn actuates the setting mechanism. a mechanism for determining justification range. counter which which the Because the keyboard number sum of the There was of These counters were units used a mechanical widths also a of the counter tape to calculate the required for justification, which number was punched at the end of the line. backwards to access the The caster would read amount of space that needed to be added to each interword space as the line cast. be tallied the number of interword spaces that appeared on the line. the by provided characters as they were keyed. which in whether the line was within This was counted hole air gets is physically separate from the caster, there needed to a the two communicate by paper tape, the caster senses the presence or absence of by the was Justification is achieved by adding discrete amounts of space between words, as opposed to a Linotype. 50 In order to tabulate all these runounts, there needed to be a counting system to relate all the character widths. A relative system was developed wherein there were so "units" to the "em" (which is, as may be recalled, the widest character in a font). chose many The inventors of Monotype to implement a system in which an "em" was 18 units. Thus, an "en" space would system will work for be any 9 units. width "em". for system that this point size, as characters are measured only in relation to their This Note caused type designed in specific quanta (in relative to an the monotype to be relation to each other, rather than randomly as for the Linotype). The casting component contains a removeable matrix (in the mathematical sense this time) in that it contains a number of "combs" locked side by side, where a row of character.s. the machine at one time. of the same set size, and relative units customized. matrix to a 18, "normal" range although to a wedge. essentially a "hardwired" table. wires are available is usually this given The can be from 5 easily configuration of normal wedge is are no Although there involved, given notches on the wedge signify a given set size. Thus the wedge describes combs the in matrix. the configuration relative units of If the user decides to reconfigure his matrix by making one of the rows contain characters 9 a Each row contains characters the Corresponding is is The combs contain 15 charactes each, and there are 15 combs, thus 225 characters on comb rather of than say, 12, another comb that 51 reflects this data Another must be inserted into the machine. correlation that must be dealt with is that of the code punched onto the tape by the keyboarder and the actual piece of type that appears in that position in the matrix case (a modern equivalent would be changing type "balls" on an IBM selectric typewriter). The actual casting process proceeds in manner. the following The caster reads the tape which causes the matrix case to be moved in a manner such that appears over a "casting" hole. the proper At thus point a square tube is fitted to the selected matrix and molten lead is into the tube and impression of the type against to be repeated until the line is set. matrix the forced matrix face, causing an formed. This process is TTS Perhaps invention the of most these development important occurred typesetters after in the 1930's. Walter Morey invented the Teletypesetting code. It system (to for sending typesetting typesetters) over telephone lines. quickly as would information to be was It didn't catch be expected for two reasons. newspapers didn't want locked the into a a drive on as First the particular editorial content, and second, the local typesetting unions did not want to lose the local keyboarding ~ork. However, by 1951 UPI began sending pre-justified anyway because some newspapers by that time were using TTS internally to their Linotypes. This implies, of course, that the to the matrices in Linotypes had to have been standarized 32-unit-to-the-em This was done estimating rounding to with the system not a by (TTS instituted more accuracy). redesigning micrometer nearest drive the the 1/32 of type, but by value of a matrix and an em. This caused rounding errors that usually balanced out by the end of the line, but not always. The tapes were punched on a that typewriter resembled perforator". a In the very early called days, a Linotype machine "multiface keyboards were fitted with punches connected to solenoids, so that as the paper tape automatically. was read, the keys were punched Later, of course, linecasters accepted tape directly. 52 53 TTS is a descendent of both the Baudot Murray code. and the The Baudot scheme allows the user to strike a chord on a 5-key keyboard. providing code 32 distinct used a keyboard that The cord is fed a serially The Murray code representations. resembled out typewriter. It also produced a 5-level code, but in parallel, much in the style of the modern alphabet, paper It tape. used 26 codes as the and two for line feed and carriage-return codes, respectively. There was also a shift code that acted as a When in "shifted" mode, the codes became figures. toggle. There were no lower-case characters. Morey adapted this system to a 6-level code, thus adding 32 codes, and used it to drive the information typesetters theory described (wittingly or above. not) used He and assigned the characters used most frequently to the representations with the least holes. This minimized tape wear; one-hole representation. conjunction lower case The addition a space had a of 32 codes in with a shift-precedence code allowed upper and characters. In addition, he assigned such typesetting commands as: thin space quad left upper rail lower rail em space rub out quad center Upper rail would matrix, which select the upper portion of the would have the roman font as opposed to the lower portion which would contain the itallic. ' . The Advent of Photocomposition The devices generation described typesetters". herein The are termed reason is that while these machines accomplish composition by exposing a medium through a "master" "first character photographic image, they do not substantially differ from their hot metal progenitors. instance, time. they are capable of setting only one line at a Then the paper is moved and another line An example would be the Intertype Fotosetter. same manner as a Linotype. piece that character on front of a of it~ by a The much the The matrix photographic film with the image of Thus the light source, then mechanism in keyboard. matrix is suspended light is strobed. '~e· result is a character is "written" onto the spot.. "set". Each character is on a separate matrix which is "called up" a is This machine has a system of machinery that uses matrices contains For film in The at that that contains the light source then moves a distance determined by the "width" of the character (encoded in the matrix by virtue matrix) and the process continues. of the width of the At the end of the line, the camera mechanism measures the height of the accumulated matrices to determine justification. All the amount justification of on space needed this machine effected by "letterspacing" -- the padding out of the by adding space between letters in the Fotosetter provides the capability of various (from 6 to pre-focused 36 pts.) lenses. words. point for is line The sizes by incorporating a system of eight Otherwise, 54 the Fotosetter very 55 resembl~s strongly a Linotype (i.e., matrix returning mechanism, etc.). Another new entry at the time was the is based on the Monotype. Monophoto. In this case, the matrix-mat (that contains the character images) contains images that are It positioned below photographic a lamp using the same machinery a Monotype incorporates. The character image passed then through a few lenses, imaged photographic medium by travelling mirrors. mirror that determines is onto the Thus it is the the position on the film that the character is put. The AFT Justowriter is typesetter f r om an ear 1 i e r typesetter. of a t e c h no 1 o g y , t hi s that Like the Monotype and Monophoto, tape-punching part and a much like a typewriter. phototypesetting unit. a evolved t i me f r om a· s t r i k e-on it consists tape-reading unit. tape-puncher punched TTS type codes, while is also the The reader is The typewriter part evolved into a It is simpler in workings as there "standard" set value for the type, and only one size of type. source The imaging is accomplished by through characters shining on a daisy wheel. a light Incidently, the wheel stops before the light is strobed. The first second generation typesetter was the 200, developed characters in the late 1940's. through photographic Photon The Photon 200 images film masters that are arranged in concentric circles around a disc. There are 16 fonts in the eight circles. selected Thus, a font is by 56 moving the axis of the disc, then the character is strobed onto the medium as it passes the light remains in motion during this process. "sized" by the use of turret. 12 lenses The disc The images may be mounted on a rotating One of the innovations developed with this machine is that of the "width card". table source. for character This was essentially a lookup widths, and, the width cards are changed to reflect the characteristics of the fonts in use. A competitor to the Photon Linofilm. 200 the Mergenthaler This device also imaged characters by strob.fng a light source through a photographic case, is there master, are 18 grids of one font each. mounted on a carousel and, when a font and but in this The grids are character are requested, the carousel rotates to the proper grid then all the grid is mechanically masked out except the desired one. The image may then be sized through a resident zoom lens. Second Generation Principles There are many similarities in the of For typesetters. negative storing images the on font will be selected. instance, all photographic second generation fonts are stored as The film. manner of is determined by the manner in which it Two steps are involved in the latter process: 1) position the character in front of source the 2) mask out, somehow, all characters one desired except Let's examine some characters of the attendant light the pitfalls. If are exposed using shutters to mask out unwanted characters, it is not uncommon for the shutters either to stick or to clip off portions of the desired character. In the case where characters are selected by means of moving the axis of a disc, vibrations are often induced, causing a blur of the image. characters on This the was avoided sending "null" tape following a font change, but this is, of course, an undesireable objections by feature. Then there are to characters that are exposed "on the fly" (as with the Photon 200). The claim here is that a faint for "sizing" "tail" can be seen following the character. There are almost as many schemes characters as there are typesetters. Sizing is the task of reducing or enlarging a character designed certain point size. The utility is for font. Some as a that one need not maintain up to seventy different sizes of a each use character for machines do character sizing by using a 57 58 separate image size for fixed-size each font used, and so use lens (as with some Compugraphic equipment). this case, only certain sizes are desired, perhaps three. good a In two or The resulting image is excellent, but this type is mainly drawback for of "straight" some of text, these i.e., machines change the font strips without galleys. One is that one cannot exposing the photographic medium. Some typesetters have two lenses that may be switched, but they must be changed manually by either unscrewing them or revolving a turret. phototypesetters Aside from these examples that can devices use a zoom lens. type in case. the Most size are automatically. It is implied that one These can machines with zooms can size at specific To achieve better appearance, some devices maintain fonts in a few size ranges, the lower two 6-12 pts. caveat (or another good on the same line: turrets to be on different lines! of resolution lenses since in some they are not as Thus the as to simulate adjacent chacters Of course there are also devices that fixed-length offer lenses. sometimes include special lenses for distorting the so An as the characters are sized their apparent base lines do not match. appear appearance. one, I should say) is that many machines can offer this sizing, but characters ranging and 12-24 pts., respectively, so that they can be sized within that range with important set range of 6-72 points, but this is rarely the discrete sizes. from the better They images italic, and others that "squeeze" the 59 characters into a different set size. The issue varied of character treatment. There registration is a number First, the photograpic medium may implying spot. that the image Or, the image advances vertically be lens moved may move while at the end-of-line. A variation would mirror that sweeps the medium as it moves. A bundle to the film Then again, both In this case a up the image and channels it to a moving lens. fiber-optic horizontally, is always projected to the same assembly picks received of alternatives. the image and film may remain stationary. collimnator also page, be to laying different convey have prism or the images on the method the a is image to use a to the spot of exposure. Character escapement -- the providing of space between successive words or characters, is the last major concern of second generation typesetters. calculating escapement afterwards. the before escapement this sizing must be larger concerns, the If the sizing is done before the width we were to move, since the In There are two image registration or smaller than the true character is distorted. case, the magnification factor must be taken into consideration in determining the distance to move. sizing or is done after the the escapement, the problem is not quite so bad since the escapement magnified If of along with the character. the true width is The escapement values must all be translated from the relative measurement system of ems and ens to the absolute machine units (internal 60 units), whether they are in parts of a meter or parts of point. Thus if we had a 9 point font and needed to lay down three units of space, and the machine moved in of a point, and the system 100ths is 18 units to the em, the following calculation would be performed: 9 pts ----- * 1 em 1 em 18 units 3 units * a 100 internal units * ----------------point =150 IU. Third Generation Typesetters The main distinction generation of photographic character between typesetter medium master not is that directly the third and second the third exposes the from a photographic but from the face of a cathode-ray tube. Some other distinctions are: 1. There are no mechanical parts involved in selection of the character. 2. The device can manipulate the characters electronically (for sizing and skewing). 3. In most of the devices, there is no need for font masters, the fonts are stored digitally on disc files. 4. They are almost always driven by a computer. 5. Some can compose full pages without moving the paper. The Linotron 505 may be thought of a "2.5 the generation" typesetter in that it scans photographic masters within the machine. with The masters are arranged in a grid of 16 16 characters per plaque. plaques The character selection is accomplished by producing scan lines in a relatively patch of space on a crt. small These lines are imaged through a grid of sixteen lenses (all in the same plane), through one character position out of each selecting 16 characters out of 256; of the 16 plaques (thus actually, 238) then through 16 more lenses, (again, all in the same plane) onto 16 photo-multiplier tubes. the 16 by One character is selected from accepting input from only one of the 16 PMT's. The impulses from the pmt are fed to a second CRT that sits on a travelling bed. The automation of the scanning is 61 62 effected by the carriage sensing per inch); with calibration lines the sensing of each calibration line, a new scan is begun, thus the characters are laid vertical strokes. characters There each; windmill-looking the can be grids wheel. four are This by causing the down grids mounted with of 238 on a device skews characters by skewing the scan lines on the ouput CRT. accomplished (1300 output Also, CRT sizing is to amplify the stroke lengths by the desired magnification size. The technique photographic of creating images masters has some limitations. by scanning The images are not as "clean" as they could be because in the scanning a of straight line, the line is detected randomly (the signal o s c i 1 1 at e s ) . not A1 so , the "p i n c us h i on e f f e c t " of the adequately compensated for. CRT is This is handled, to some extent, by locating characters that distortion that has a minimal effect on (as some punctuation) at the edges of the grids. Another drawback is uniformly, nor that the do P~AT's not age do they stay calibrated uniformly, causing some characters to register brighter than others. The Videosetter character is one is another machine grid within the typesetter. character characters are grid of projected dissector tube" (like character selection. a 106 onto that television) face of which the All the the "Image does the It too uses a second CRT for output, but this CRT has a fiber optics bundle fused to Since a In this case, there characters. the scans its face. fiber optics bundle comes into contact with the 63 photographic mediwm, there is no sizing done This values for the characters directly off the character grid. Next to each character is bar reads the lenses. width a machine with code encoding 4 bits. A constant, "3", is added to each value obtained, thus giving the capability to encode a range of 3 to 18. In later versions of this machine, 8 grids were added in turrets to allow larger "on-line" font capability. Digital phototypesetters differ from the preceding devices in that the character masters are scanned only once (and usually on a different device.) The character data then stored lengths. the in memory as "stroking information", or run Characters are written one at a time beam on is by turning and off for limited duration defined by "on" and "off" edges of the characters. Characters are usually digitized in three or four size groups at 450 to 900 sizing is accomplished widening them strokes/inch. by lengthening The reason is that strokes and also by increasing the voltage to the CRT. Thus if a character that was digitized at 720 lines per inch enlarged inch. be 50%, it reduces the number of strokes to 540 per Even with thicker lines, character pushed too is far larger point sizes. as the stroking cannot roughness can be detected at We cannot start with an assumed large point size and scale down, as this will cause the character to appear too dense and smudged. enlarge an effectively. image by Thus it 40-60% and is reduce possible it to 20-40% 64 The early versions of the RCA Videocomp fonts to be used to tape, then intB core. core and a required all be loaded at the front of the data If there character of weren't another enough fonts in font were required, a complete font was required to be overlayed from disc. This shortcoming was overcome in the 830. Autologic developed machines that set the type as film advanced past a speed up the process. sizing, so every These "window" in an attempt to machines had no electronic font was digitized separately. not as bad as it seems character setting stroking as data the in company a the also compressed This is stored format, the thus packing more characters into memory. A totally different method of producing characters was introduced in the fixed-head drum (it however, weren't patches of dots. drum, it Fototronic CRT. held chars). stored 1024 as characters, As the information was read in from the was used to direct the movements of the CRT beam Thus it d i dn ' t rna t t e r the initiation of the read took place as long as the cycle lasted only one drum revolution. size The stroking information, but as i n "spray i n g" t he char act e r out . where It stored fonts on a This machine could electronically and produce forward or reverse leading in 1/10 point increments (the utility of this will clear in the discussion of vertical justification). become 65 Most phototype information as setters receive predefined strokes their character that is, "stroking" takes place off line by reading what are "perimeter" permiters characters. formed typesetter by reads Perimeter co-ordinate perimeter that pairs. The characters from Metroset disc the a density. Another they are This "look innovation ahead" algorithm characters of fonts that are not in characters in. This is done by largest point size and throwing away strokes to narrow it down. implement and all the characters, no matter what the point size, have the same stroking assuming called characters are simply generates stroking patterns as the data is read implies the may be needed. it made that core, so was to anticipates that these read-in in advance for use at the time The other major manufacturers soon adopted this technique. One would think that newspapers would have been to use phototypesetters, but they needed to quick await technology for making plates from photographic film. however, they are they can include half-tones as of the composition process, and they have better logo versatility. them Now, the largest users as they can have all the type fonts on line; part a to take The speed of classified phototypesetters advertisements minute, thus creating more revenue. up also allows to the last Composition Comp/Set 500. This particular typesetter entry" style typesetter, meaning each line is it is typed in. is a "direct composed as Before beginning a sequence, the operator must indicate various control parameters: 1. point size 2• f on t numb e r 3. line length (in picas) 4. leading 5. manual or auto end-of-line decisions 6. minimum interword space 7. maximum interword space (expressed in relative units) In manual mode operation, decides to end the line. In automatic user types until he The machine aids in the decision by "beeping" when the operator range. the has entered justification mode, the operator continues typing and the machine ends the lines automatically and pushes any word The that doesn't drawback of fit to the beginning of the next line. such a machine is that i t cannot redistribute lines that have already been set (for vertical justification). to move The vertical justification may be desired a line from the bottom of the current page to the top of the next. 66 67 More sophisticated processing can be using a "front end" computer. accomplished There are two contexts for this application -- batch and interactive. system by The interactive is similar to the Comp/Set machine described above, except the system writes a file for driving rather than setting text immediately. perform sophisticated dictionary. hyphenation a typesetter This system can also using an exception In addition, the user has the option to reset the entire page using other parameters if he so desires or, the file contexts. "override" desires. that, may be sent for typesetting and reused for other The interactive system relies hyphenation offered by The mechanism is called on the a the user system to if he so discretionary hyphen when inserted by the operator, takes precedence over the system's hyphenation. A batch system is a much larger software package that contains extensive logic for balancing lines on consecutive pages and other considerations. that It usually files have been marked up much as a RUNOFF file is, and the imbedded commands direct the software as to to reads take for the given context. was discussed in class; brackets. Thus its commands appear within tf,l;ps,36 drives a encoded particular into the typesetter. phototypesetter driven by 8-bit square instructs the program to set names) and a point size of 36. the program are actions A language called PAGE-3 the type face to "1" (which is an index font which into a table of The commands given to binary Each codes language font reserves used seven that by a to 68 nine of the codes as meta characters for providing escape sequences from the normal text. The Comp/Set capability to typesetter set characters sizes on the same Videocomp mentioned line. of above lacks the widely differing point However, for machines as the 500, this capability must be provided for in the composition software that outputs code for that machine. Therefore, the software scans the next line in its entirety looking for the character with the This value then advance the CRT becomes bea~ the largest body leading. runount the typesetter will when it begins writing the new line. Getting back to justification, a compositiori package can take one of four main tacks in its end-of-line decision processing. buffer In all cases, words are set until negative. throw the counter into the output of space left on the line goes The first tack in dealing with the line is to away that last word and attempt to justify the line without it. The second tack must be considered if the line would look to spaced-out without the last word: the remaining spacing words also. A on the third line using alternative is space out inter-character to hyphenate. Otherwise the program should go back to previous lines attempt to reset them tighter or looser in order to make the current line set correctly. this is to keep and at least resetting at any one time. The three way pages Page-3 handles available for 69 The specification of text position on a page is also a capability of batch systems. into "blocks". provides A block the block must bounds be language, is for defined, before any All typeset material is set an the using type imaginary rectangle that text as it is set, and the the commands can be set. within the There may be any number of blocks per page, and the blocks may be positioned anywhere on language. many the substitute stream. the page, also with specifiers within the Blocks may be named and used multiple times on a page . or allow a user pages. to Most batch compositon systems also define typesetting macros commands that, and/or Added to this is the ability to variables controlling conditionally execute statements. language really is a language. the Thus when text test called, into various composition, the the of and composition Hyphenation Mention was made above routine hyphenation justification quality is routine. incorporates of hyphenation actually Most extension composition A of the aesthetic of hyphenation, but the degree to which one relies upon it is dependent on newspapers instance, an routines. hyphenate the application. the most often, however, high-quality books try to minimize it; For while a total lack of hyphenation can be as disasterous as too much of it the result is a page with "rivers" of white space in it. We know that when we hyphenate we do so on the last word of the line. For this nature of a word. purpose, side hyphen on must define the exact A first approximation would be to define it as any string nf non-blank either we of This it. either side characters may of with spaces on be extended to include a i t the also "micro-computer" may be divided at the hyphen. word We can also add the em-dash and sometimes the slash. Now that we have our word, we find we must "clean it up" even more by stripping the punctuation from either side of it (remember, this is hyphenation point). for purposes of discovering Another class of characters that must be removed from the word would be those of the language that may the user may wish "forever". for(fn,2]ever[fn,l] (itallic, say) be imbedded in the word. to The a hyphenate mark-up the commands to specify that we composition For example, sub-word mny switch before setting the word "ever". 70 "ever" appear to font in as 2 There are 71 some exceptions, but they must be handled as special cases, for instance, with 6:45P.M., the P.M. must appear with the 6:45. There are two main approaches to hyphenation: and by dictionary. Often the two are blended. logical One of the methods of the logical approach is to take a four-character window into the character division can be made mid-way. characters that will and will printers). to see whether a ·It starts with the last two fit on the line with the hyphen and looks at the next two characters. break checks This assumes the first achieve the tightest fit (a preference of many Thus if the word that is to be divided is "photographer" and the characters "photogr" will fit with a hyphen on the line, we analyze the questions are asked: tetrad "grap". Three 1) can a syllable begin with ap? can a syllable end with "gr"? and 3) may an be hyphenated between an "r" and an "a"? English word The answer to the second question would be no, so the algorithm moves to left in 2) the the word, character by character, posing the same three questions until the answer to all three is "yes". Some hyphenation routines start at the first character of the word and work to the right, searching for a possible breaking point. This insures that if we take the first breaking point, we find it will either set or it won't, but we needn't keep trying as any other attempts fa i 1. clearly will 72 Another method analyzes For pairs. character instance, a syllable will never end with the characters be, bd, bg, bh •.• be, bf and a syllable will never with bb, Thus for each letter in the alphabet, we keep all the illegal occurrancesy as begin checking a This, however, is only useful mechanism for some general hyphenation routine, as it tells us only what Hyphenation is more either side of a basilic both purpose won't work. challenging when there is a vowel on consonant. hyphenate For in instance, different basil and fashions. One therefore has a 50% chance of being right. A less storage-intensive structure analysis. suffixes around hyphenate. is to use word We merely store a list of prefixes and which Then method words easily hyphenated. it is as usually acceptable to reentry, restructure, etc are Dis-, in-, mis-, and un- are common prefixes, ~hile suffixes. One still runs into trouble with such exceptions as inchworm, -er, -tion, -sion, -ible, -ness are common inkless, union, and unified. resolved with more logic, but one must the system logic. is There to be somewhere if overburdened with hyphenation is also the words that hyphenate connotations. Examples homographs, different not stop These may be unresolvable problem differently would be of for their minute and produce. The logic above may be augmented with a dictionary which there exceptions. are two types, full dictionaries Thus a word may be hyphenated using the of and above 73 logic then that such a dictionary. syllable If the or word may be checked against word appears as the logic hyphenated it, it is wrong and should be tried over. It is not unreasonable, however to maintain a full dictionary line. Such a dictionary may be searched in one disc access and have a very good chance of having the case, there would be no need by keying in In accepted that As mentioned hyphenation a discretionary hyphen. done, the user's hyphenation is analysis. word. for logic. above, the user is able to override lookup on logic or If this is without further Input To A Typesetter Because the includes the "turnaround on many typesetters developing of the photographic medium, there is a recognized textual time" input need to to the insure the composition perfection program. of One way to minimize typo's is to write the equivalent of a TECO to the macro look for commonly misspelled words and correct them, an example being "teh" for "the". driver file that contained Such a program could read a all the search and replace options. This is even more cogent when the input has produced with an OCR machine. The more sophisticated OCR machines can read pages of material consisting of various These fonts, and machines recognizing some can sometimes an "1" as been type even read hanwritten data. make "comnon" a "1". mistakes as Thus one could search for words that have imbedded numbers (assuming the material not is supposed to have that sort of construct) and fix them. Other "format" errors machine can also be found, as when the has inserted a space between the last character of a word and the period following the word. is of believed dictionary to be program "suspicious" correct, that it may underlines When be input submitted to a words i.e., that are not recognized. then be scrutinized by a human for final the that look The words may judgement. This added care saves time and chemicals. The correction of errors commands can be more in difficult. transform the typesetter or person 74 the actual typesetting These errors serve to entering the typeset 75 commands into the commands. typesetter to a programmer of sorts, since he must debug Often an incorrect mark-up command causes abort execution in unfathomable ways, i.e., with no meaningful diagnostics. as entering picas. a The error can be as simple the column width as ten points instead of ten In such a case, an indention of 2 ten-point ems would cause unresolvable problems. Clearly, text processing is involved in two disparate stages, the data exists as "text" in a computer memory, but also as ink on a wherein the text seemingly is page. If an error occurs positioned on the wrong part of the page b u t i s o t he r wi s e cor r e c t , i t i s mo r e p r u den t t o " pas t e up" the output or to change the input to the typesetter and run it again? Both methods are employed. In the instance in which we re-run, it may be because the file will be used again to produce the same typeset output (as when the generates a telephone book). The other method would be used when the job is a one-time-only job or if it in file is done a small shop that wants to spare the expense of running the job again. •. Pagination In computer typesetting, pagination is the process of outputting codes which cause a typesetting unit to generate fully composed pages. in fact, many of Looking back, one of addressed text. are This stage of the industry the problems the first new; are not fully resolved. (and simplest) problems by the industry was that of setting multi-column Assume we want to set a page of three 60 is lines deep each. columns that We should also assume that the leading and point-size of the lines don't change too. The procedure would then be to read the input file (randomly if possible) and write the 1st, the 61st and the across the top line of the output page. lines Then the paper is advanced within the typesetter and the 2nd, the 121st the 62nd and 122nd lines of the file are written as the second line of the arcane page~ a The reason fashion this must be performed in so is that the typesetter could write only one horizontal line at a time, with no backing-up possible. This scheme worked ok as long as the leading was constant. Now suppose the first and third columns are set as 10 on 12 text (i.e., 10 point letters on 12 point bodies) and the second column is set as 8 on write the 9. The program must then first line, then drop down nine points to write line two of column two. Afterwards, it drops down another three points to write line two of columns one and three and so on. 76 77 Another problem that had to be dealt with was that the serial nature of typesetting commands. if a column of text is to be set, size the of For instance, leading and point and font number are all specified, then all the words in that column are set with those parameters. When column two is begun, the appropriate commands precede it and these commands remain in force for the duration of the setting of the column. to set Now, if the algorithm specified above is used t h r e. e corresponding co l umn s , to the t he types e t t i n g individual command s columns will be "lost". Therefore, the program must discover and save the state of each column for future use. Of course there is a way around all this computing and that is to use a typesetter that can move the paper backwards or a typesetter that can write a whole page at time. These exist now, but did not exist when the above problem was first addressed. discussion of a "blocks" for As mentioned above in the composing, the blocks may be specified to sit anywhere on the page, and therefore the page may be "made up" in that fashion. Perhaps the one driving force in page the widow. A widow a widow cannot the page. Now the rule being Since a widow cannot begin programs must deal with it in one of three ways. -~--- able to The reason for this is lost in the oral annals of printing. ------ is begin a page, and this is sometimes extended to include two or three lines not begin is is a line that is quad-left that is immediately preceded by a justified line. that composition ------- - --~---------- ----- ---- -- ------------- - a page, One is to --------- ~ 78 go back a page or two and reset the text either tighter looser so as to either pull the widow back onto the preceding page or to push an additional line the page with the widow, thus making or it two vertical justification by page "carding". the of lead lines of print so as to increase the leading. This, of course, pushes more lines onto the The and The term comes from the printer's term for inserting cards between onto acceptable. Another method would be to go back to a previous perform or widow's page. other alternative resembles the previous one, and that is concerned with illustrations or photographs. amount of "white" space page, and this space may satisfy a widow surrounds be The certain an illustration on a tampered condition. A with in order to ability to do vertical justification implies the typesetter can advance the page in increments at least as small as one tenth of a point. A similar printer's condition, that facing pages should be of equal depth, is handled in much the same way. There are pagination. therefore One few different ways to do mentioned already is in the Page-3 style of specifying positions program. a of blocks from within a batch Another would be to arrange blocks interact.ively with a "page makeup terminal" in which the operator has the ability whole to move the blocks around at will, then to see the page before it is set. Other special-purpose terminals are used in newspaper advertising offices. These allow tne user to type in text then to enlarge or reduce it using special keys. The characters are "stick figure" 79 characters that are meant to give a proportions of one idea of the type to another and not as the actual representation of the type. capability general These terminals also have the to define irregular shapes around or into which text may be "poured". This completes composition. the treatement of printing and The remainder of the semester was devoted to an introduction to the technology of office automation. word processing and The treatement of these topics from the point of view of their impact on the beyond the scope of this course. office personnel is Word Processing 1 The trend of what we now call with the introduction and speed, normal manner. 1930's with efficient where a p 1 aye r - p i an o ) changed to capability and 1at e r , to merge terms an both in the of the Friden Justowriter. t he These text of advance roller-paper in paper-tape. The "trend" is in made introduction This device punched and read began "scribe" is defined in the Word processing the processing of the typewriter. that of making scribes more accuracy word (similar a 19 5 0 ' s , t he me d i urn was devices from to provided the various paper tapes, and could also perform a rudimentary text search. The term "word processing" was finally coined 1964 in when it Typewriter (MT/ST). storing a introduced improvements Mag Tape This device uses tape textual information. radical the change from in IBM Selectric cartridges for While the technology was not the storage by Justowriter, density chars/inch vs 11 chars/inch with (it paper it could tape), offered store the 20 tapes were recyclable, and the transport machinery (for the tape) was faster. Typewriter In 1969 IBM introduced the Mag Card (MC/ST). Selectric Characters were stored on a magnetic card that could store a minimum of one page of text. the introduction Until of the MC/ST, IBM had little competition IM~;~--~f-~h~-~aterial on word processing derives from Data P~o Dafa Re~orts on Word Processin[, PfojResearch-Corp7;-r~r~T7 ' 80 (Delran, New Jersey: -Tne 81 in the market, but since then, over 100 companies have entered the word-processing arena. In word schemes for processor providing technology the there are two main function of a word processor: standalone units and shared logic. Within the standalone genre, The first is the there and control logic, medium recorder is used (with internal the same platen. within types. It is a coupled with memory and a magnetic The platen paper) to generate the text and to perform to over $10,000. type is card, cassette, or diskette. subsequent editing functions. on four standalone mechanical system. keyboard and platen wherein the keyboard edit are The final document is output These devices are priced from $5,000 Standalone display this genre. In systems this type are another of system, the editing and input are registered on a display screen while the output (i.e., the final product) goes to a daisey wheel printer or a selectric printer. more internal memory than These systems usually have that required for one page's worth of text, so the user may "scroll" throughout long page, pages. and Some of provide special even these divide offer systems are user-programmable is system. more to functions, and of these, some are capable a limited normally These devices data-processing facility, say tallying a column that is flagged this very this long page into smaller of being down-loaded from a magnetic medium. sometimes a found for on this purpose, but a shared-logic type of These systems usually price around $10,000 to over 82 $25,000. The third category is really a sub-category of the previous type of system. "thin window" display It is system. called This idea is that particular line in category in the user memory. can It low see offers gas-plasma). all or part of a from the first that it saves paper, and has a lower purchase end The fourth type of word processing technology. covers It is called the electronic typewriter and is usually an upgrade from normal a or differs price than a full display system. the standalone device partial- or full-line display (either CRT The a typewriter. It has a internal memory, but differs from most word processors in that there is no provision for secondary storage. Many of the above This the case diskettes copying lists. text, to systems are dual-media stations. in which it has provision for two tapes or run concurrently. This facilitates both and the merging of "boilerplate" text with address A being ~boilerplate" a section such as for a form letter. offer a communications of non-varying Some of the systems also interface and the capability to similar to drive a higher-speed printer. Shared logic time-sharing in word that number of stations, processing and processors one minicomputer is used to drive a but document the CPU stream is processing. contrasts to word processing in "word are generation" that in dedicated word Document processing word nature processing includes software resembling to processing while RUNOFF, is document automatic 83 abstracting software, and a table description language {for, perhaps, driving a phototypesetter). "data processing interface". to store data as both enables "DP" to integers search and text ASCII. This files for needed information, as figures from a current annual report, Typically, up to 12 stations high-speed may line computer-type share printers, tape common OCR drives, photo- comp interfaces. that etc. stations are supported by a central minicomputer (but there exist some that drive The a This provides the capability binary programs Some include up to peripherals equipment, network 35). such large as discs, communications This stems from the obvious and point there is a non-trivial computer controlling them all. Such systems share not only the logic but the cost as well. Thus of f i c e s wi t h high v o 1 ume rna y be cos t e f f e c t i v e : costs are measured on a per/station basis, the price if is $10,000 to $20,000 (from an overall cost of $25,000 to over $150,000). the CPU Other advantages are non-trivial: may be used to all searching of very large files, maintaining updates; page also, format the and have right logic systems there hyphenation routines pagination over numerous printing of documents may proceed as other text is being input. systems the power of Most standalone word processing margin are justification, frequently fairly that uie dictionaries. but on shared sophisticated A standalone system normally asks the operator to perform hyphenation. Some advantages of word processors follow: 1. Higher-quality output due to better equipment. 84 2. Output is error-free. 3. Higher utilization of installed machines implies a "word processing" department). 4. Reduced typing time due to the "rub-out" key. 5. Faster output speeds (450 higher). 6. Drastic reduction of "repeat" typing. 7. No retyping at all case). if words for per minute photocomp (a In terms of operational capabilities, the gap word processing quickly. (this and re-run between and text processing seems to be narrowing In the following list enumerates some of the features of many word processors. 1. Characters may be displayed as variable-width, thus showing where lines will break. 2. Underscored characters are underscored on the screen or the start and stop control codes for underscoring are visible. 3. Multi-column display format available. 4. Superscripted and so. 5. Ability to store often-used text for insertion. 6. Displaying of such format-control line-spacing, pitch, etc. 7. Stored-forms processing where the from one· field to the next carriage-return. 8. Ability to perform queued-printing, where CPU manages the print queues. 9. Global search and replace. subscripted characters appear characters as operator moves by typing the central 10. Automatic alignment of numbers. 11. Automatic page numbering, headers and footers. 12. Document merging. Office Automation and Electronic Mail Systems Office automation concept that seeks may to be described as the incorporate discrete technologies into a tool for establishments (corporations, departments) and institutions). "glue" groups of governmental establishments (research Its advent is made possible by the growing capabilities of word processors and/or timesharing computer systems. However, word processors are only element automated office; the impossible without networks. Thus, the rest of experience of which gained an would be with computer technically at least, computer networks, with message software the realization of the a~~ editors are the system relies. basis The on which linking of local intelligence in the form of either editor programs or processors is realized in the form of electronic message systems (also called computer message systems). not enough time reorganization (with two lectures attributable to to an, less-paper-oriented office: This was if that not automation There is time has made paperless, the electronic message the then system The harbiger of the ElVIS is, in a way, the ARPANET. network transmitting is network. a packet-switched network up to lK bits in one packet. to stress at this also office left to the semester). enough, however, to describe the tool (EMS). There to dwell upon the implications of office staff transformation word point that an EMS is capable of It is important not simply a It provides for temporary storage, if needed, and performs long-term (up 85 to weeks) monitoring of 86 The messages. characteristics of EMS's describing are not of any one particular collage ~00 1 of that I will be EMS, but are a (a message system written at SRI), HERMES (by BBN), the DMS 2 message system (MIT), the ETHER-NET (Xerox), and one proposed at Information International. An EMS is a messages are computer. hardware In and somewhat really any misleading in that the English text kept in the memory of a case, software such for a system requires both coordination of messages (the hardware being the network and file used term system). An EMS is to send messages of any length to other people on the "system". within System in this case means a network or which networks a person or concern may be addressed. EMS one can 1) send a file created by an editor; In an 2) type the message directly to the EMS by running the EMS program. There is an. underlying assumption that one needs a terminal to access messages addressed to him: in an automated office, there is normally one terminal associated with each secretary or header of the work station. following form A message always includes a as the beginning of the --------fst~;~t Cracraft, MM User's Manual (Menlo Park, Stanford Research InstTtute~-TnternatTonal, 1979). 2A. Vezza and M.S. Broos, An Electronic Message System: \~ere Does It Fit? (Reprinf rrom-ProceedTngs-or Ca.: !~~-T!::~!!9.~-[[9:-~Q.E:II§:~!I2.!I[-!~!~:.. Q2.J]}Q.!!!.~!. ~~I~~fg[r:- - -- 87 message text: (supplied by the system) (supplied by the system) (recipient list) (carbon list) (subject of message) (lines of text) Date: From: To: cc: Subject: Text: Messages are normally appended to a file in the user's file storage area. duplicated; which In some systems, the a program maintains a file pointers to recipients are messages of are not "pointers" maintained. in In this system, when all the addressees have "seen" the message, it is deleted from the system (unles otherwise specified). Messages may be sent to distribution individuals (self included). lists Also, on and/orspecific some "in house" (meaning not over an external net) EMS's, messages may have an expiration date; after which the addressees who didn't respond within a set time are made known to the sender. such a system, week are messages automatically In within the system longer that a returned. "appended" by an addressee. A message may be If this occurs, the message is marked as "unread" by all others and recirculated. Messages are accessed through a program. action of such To make the a program more concrete, I give here some sample commands from~~ (SRI)3: The user may ask for the status "mailbox" in a number of ways. of messages in his 88 1. Since dd: dd. type titles of all messages since date 2. Subject strin~: give titles of all messages "string" mentioned in the subject heading. with 3. New: give titles of all messages sent here program started. since 4. Seen: give "seen". marked as 5. Text string: titles of messages with "string" the text body. in 6. Unanswered: 7. Deleted: 8. From string: titles of all messages titles of unanswered messages. messages deleted in this session on program termination). (they !~~llY ~!~deleted titles of messages from "string". Disposition commands follow: 1. Read n: Read message number n (all messages your box are numbered by the message system). 2. Answer n: EMS responds with TO: The proper response is ALL or sender. EMS then accepts your message either specification or as input. as 3. Copy n,filename: "filename". into 4. Forward n,addressee: addressee. 5. Delete n: 6. List n: 7. Mark Mark message n as seen. 8. Next Type the next message. Copies message Forward n message a in file f i 1e n to Delete message n at end of session. Print message non lineprinter. Now to send a message one types: l\llM) send To: Soloman,McLure@SRI-KL appears after the"@".) (the network address 89 text, etc. Why would anyone want electronic mail~ Broos 4 say the average business letter costs Vezza and EMS and Writer's time $1.45 Secretary's time $1.76 Postage and paper, etc. $0.59 The postage and paper would be taken care of with the first two would be cut substantially. The same study projects that the Post Office delivered 9 billion pieces of mail that were writer-to-reader in nature. The advent of electronic mail to households would cut the costs of this organization substantially. Then there is a question of privacy -- for the message originator. In business, the secretary of the originator sees the message, as does the receiver's secretary. With electronic mail, this is circumvented. Ease of access is another access his mail carry his own on from point. An executive can anywhere there is a terminal --or business trips. This allows him to swiftly answer any pressing business at the "home office". Other obvious attractions are electronic speed of messages, the receiver messages, and the computer need may the not be "present" to receive be composing and storing the message. used as a tool for 93 to generate intelligent composition software. fonts and Lectures normal printing techniques should remain; provide the ~easoning for other composition on they concerns as kerning and relative character widths. The time saved could be employed in lectures semantic concerns editors, composition languages) and as general base and general publishing. The currently under development by companies International, Inc. for such macro processing (i.e., in languages data on like computer latter is Information use in composing text shared by composing and general data processing programs. The text would be described by its schema entry in the DBMS in terms of its semantic meaning but also in terms of its part in document, as a "paragraph text". under "running head", "paragraph As mentioned, the software is head", a or currently development, but by the next offering of this class, the details may be more formalized. It is difficult to restrain the content of this course so as not to attempt lengthly discussions of all branches of Computer Science. For instance, the Office Automation material may be expanded (for the class's edification) to a detailed discussion queueing theory. on typesetting discussions, of may page. the allow not enough. alone could fill a semester. upon networks and Once again, some restraint in lecturing hardware but packed-switching preceding more time for such The Office Automation topic A sample curriculum, based reflections, appears on the following 90 Professor Ed Fredkin at MIT 5 suggests additions system a that supports EMS in the form of a calendar program that reminds the user of pre-defined on-line to databases and meetings, light computing. access to Obviously, these are elements that could comprise the automated office. --------gi~;;;~ Fredkin, (Cambridge: Artificial unpublished). The MIT Comerehensive Svstem, -rnteTTTgence -tafioraTory~~~T, Conclusion When the editor This as class part stemmed "v a 1u a b 1 e" expressed of from c 1 as s shock at was the anxiety t i me that to on my spec i f y of me part it and capabilities. embroiled asking spent. suggesting Members in p r e f e r en c e s . necessity, of heated a required proposals of others. from supply fighting sup r em a c y , method display) sometimes each the became for their g r o up , of to offer counter examples to the This process one (or class in In need to offer a to debate using The commands the In f i g h t i n g for was proceedings for of by c orrm i t tee • certain class of capabilities, say print-out commands, the the project, I was a bit discouraged. reflection, however, it was time well employed specifying of transformed the class lecture to one of debate. From these debates came ideas that none of us individually would have offered. For instance, abstrace data type was born. the the idea of a Q-register This essentially made each of Q-registers a buffer, and there would be no difference between editing the "buffer" and a Q-register. A simple context switch could provide the generality. It is appropriate to mention here ideas some of the developed in those discussions were not assigned due to the size of the overall project; that that but is was made clear students would receive "extra credit" if they were to implement any extra features. Some students favorite ability feature being the commands on an input line. 91 to did enter so, the multiple 92 While the decision to require the class the editor to implement project in PASCAL appears not to have hampered the implementation, it did hamper the flow of the class in that On it brought about four or five unplanned lectures. closer examination, discussion of handling. in the however, one it manufacturer's allowed a detailed attitude towards text The CDC Cyber 174 is computationally I ight of, say, Digital fast, but Equipment software, its implementation of text-handling tools is not adequate. particulars are given in the main text. In discussing features of PASCAL we were able to discuss the the language characters ability to perform text processing tasks. data type was found to be useful and thereby in creating lessening the The of The "set" classes amount of of code generated. I think it is a good idea to require the project to be written in one particular language for the reason mentioned earlier: the discussion of implementation problems remains homogeneous. That there need be such discussion at all may be questioned until hadn't sufficient editor w i t h out insufficient one realizes programming s orne many of experience d i r e c t i on • Th a t is, the students to develope an there was knowledge of such standard data structures as linked-lists. In teaching this course again, condensation I would recommend of much of the typesetting hardware lectures. However, some knowledge of the hardware, even the versions, a earliest is necessary to appreciate the methods developed 94 A Sample Curriculum For Text Processing Although the course lasts 15 weeks (45 meetings), one week is left unspecified for expansion. 1. (6 meetings) Introduction Introduction to "text processing" through use TECO as an editor and progrrunming language. 2. of (6 meetings) Survey of editors The class is to specify an editor. This is followed by d.iscussion of implementation details, including the source language. 3. (5 meetings) Pattern Matching Details of simple pattern matching as well as TECO's method. May also cover search-optimization techniques and/or generalized search techniques. 4. (6 meetings) Formatting Programs Internal details of text formatting programs. This lays groundwork for the "composition programs" discussion later. 5. (5 meetings) Typesetting Includes a survey of typesetting devices (and their capabilities) and a discussion of printing in general (i.e., characteristics of type). 6. (6 meetings) Composition Programs A discussion of composition programs comparing batch-oriented systems to interactive systems. Should include discussions on hyphenation techniques, page layout and datab~se publishing. 7. (5 meetings) Word Processing and Office Automation A discussion of standalone vs. shared-logic WP systems. Office Automation as a hybrid of word-processing technology and networks. A discussion of such networks as ARPANET and CHAOSNET. Also Electronic Message Systems and how they differ from networks. 8. (3 meetings) Macro Processors discussion of macro processors, their A implementation, and their place in text editors and assembly and composition languages. 95 One point I emphasized and one I feel is important that of hand, the it dual role requires text processing assumes. the logic of complex On one hardware software in order to "compose" text onto a page. "text" assumes i~ dimension of data text-processing processing. application mentioned above is a case form On the other hand, a vehicle of information, and so a and Printi_ng, and by extension, text processing is therefore an art that requires intuition and dilligence. is in The database point. In its purest form, the text stores information that is not easily represented containing by integers. real-estate For instance, a database listings captures information such as owner's requirements for assumption of the title. requirements are rarely amenable to standard codes. requirement could be "the buyer must pay 20% down remove One not the rose bushes from the front yard." This is not a standard stipulation that can be number and These for "20% down" and so represented must by a coded be included in its o r i g i n a 1 f o rm. Text processing is therefore calligraphy. information theory and Bibliography Boyer, Rober t , and J . S t roth e r Moore . "A fa s t S t r i n g Searching Algorithm," Communications of the Association for Comnuting:-!Macfifnerv~--20:7[2-77~~ Ocfooer~-rg7v~- ---~---- --------~ Cracraft, Stuart. lVllVI User's Manual Menlo Park, Ca.: Stanford Researcn--Tnstitufe~-International, 1979. 28 pp. Data Pro ReQ_orts on Word Processing:. A Report Prepared by ---Tfie -nafa--Pro--Researcn-Corp. Delran, New Jersey: The Data Pro Research Corp., 1979. 394 pp. §l~~!!£~~~ Ra~2 §t~pc!!2=22;l~~l ~~~tl~Q~~g ETq~l£~~~Tt ~~!~~!· ft repareu uy Fros~. anu Su T1van, Frost and Sullivan, Inc., 1977. epor~. York: nc. l'lew Fredkin, Edward. The MIT Comnrehensive Svstem, Cambridge: Artificial ___ -rnteTTtgence ____ Liooratory, MIT, unpublished. Grogono, Peter. Programminz in PASCAL. Reading, Mass.: Add i s on-VvesT e y-PuoT 1 s fiT n g-co:~- I n c . , 1 9 7 8 • 3 5 9 p p . Horrowitz, Ellis, and Sartaj Sahni. Fundamentals structures. Woodland Hills,-carrrornTa: Science-Press, Inc., 1976. 564 pp. of Data Compufer Kernighan, Brian W., and P.J. Plauger. Software Tools. Reading, Mass.: Addison-Wesley PubTTsnTng-Company, 1976. 338 pp. Seybold, John. Penn.: Fundamentals of Modern Com2osition. SeyooTa-PuoTicatTons~-Inc:,-r~vv:- Vezza, A. and M.S. Where Does Tfie--Trenas NefworJ<s:-- 96 Media, 394 PP· Appendix A III 1070 EDITOR USER GUIDE COMMAND LIST BY GROUPS 97 98 III 1070 EDITOR USER GUIDE COMMAND LIST BY GROUPS I• CURSOR POSIT I ON I NG COMMANDS B nC J nL nR nU nStext$ nNtext$ I I. SIMPLE TEXT MANIPULATION COMMANDS At ext$ nD Etext$ It ext$ nK Otext$ M Bottom of page Move n Characters forward (-n reverses) Jump to top of page Move n Lines down (-L moves up) Move n characters backward {same as -nC) Move n Lines Up (same as -nL) Search current page for 'text' (nth occurrence) Search entire file for 'text' (nth occurrence) Append 'text' to right of cursor. Delete n characters in current line (n.GT.O only) Exchange 'text' for character Insert 'text' to left of cursor Kill n lines including current line (n.GT.O) Overwrite character(s) with 'text' Merge next line to end of current line III. Q-REGISTER C~~S (%is Q-register A,B,C,D,E,F or G) Load% with n lines beginning at character Load% with text between copy-cursor pair 't' and '1' Set left copy-cursor '~' to current character Q< Q) Set right copy-cursor •l' to current character Clear copy-cursor marks •{• and •l' QZ Kill text between copy cursor marks •l• and •I' QK Get contents of % and insert to left of character Gl6 QI%text$ Insert 'text' into% QT% Type contents of% (follow by typing 'T') n:XOA> QX% IV. FILE MANIPULATING PN PC PX PA PM PK Turn page Write out entire file, continue running EDITOR Write entire file, EXIT to monitor Append the next page of the file to your buffer and insert a "NOT" L Append the next page of the file to your buffer, inserts nothing extra. Release files and restart editor. Q-registers are preserved. V. OTHER USEFUL F vs VT T nT C~MNDS CO~~S Force case change of current character Toggle switch for case-sensitive search To~gle switch for case of TTY ** *Special--refresh display on screen******** Display n lines on screen, minimum=3 SPECIAL KEYS WHICH ACT AS COiV1lVIANDS IN COl\1MAND MODE 99 COMMAND L c R u I ? KEYS L key,Carriage return, line feed, down on keypad C key,Space bar, right arrow on keypad R key,Rubout, back arrow on keypad U key,Up arrow on keypad I key,Tab key (inserts tab too) ? key,Types this message arrow A NOTE ABOUT CHARACTER CONVERSION: The, character allows the creation of any character in the output file. During output (,letter) is converted to corresponding control character. (,number) is converted to the corresponding character in the range of 170-177: i.e • .,5 becomes a) . Appendix B A Sample "Read" Program 100 101 %$D-%$L-\(*,NO DEBUGGING, NO LIST*) (* PROGRAM TO ~~KE LIST IN CORE OF A LINE OF TEXT AND THEN TYPE IT OUT TO TTY, UNTIL EOF SEEN *) TYPE NODE = LINK = PACKED RECORD NEXT: 'NODE PREV: 'NODE DATA: CHAR END; 'NODE (* DEFINE NODE STRUCTURE *) (*POINTER TO NEXT NODE IN LIST*); (*POINTER TO PREVIOUS NODE*); ( * ONE CHARACTER OF LINE * ) (*DEFINE POINTER TO THIS TYPE*); VAR LBEG,LEND: LINK(* POINTERS TO BEGINNING, END OF LIST*); STATUS : INTEGER (*FOR RETURNING STATUS FROM PROCEDURES*); CH : CHAR ( * VARIABLE TO READ CHARACTER INTO *); PROCEDURE READLINE (VAR BEG,EN : LINK; VAR STAT:INTEGER); (* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER VARIABLES BEG AND EN TO POINT TO THE FIRST AND LAST NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED, OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE PROCEDURE. *) VAR P : 'NODE (*A LOCAL POINTER VARIABLE FOR "NEW" *); BEGIN BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL; IF EOF ( INPUT) THEN STAT:=1 (* RETURN EOF *) ELSE BEGIN IF EOLN( INPUT) THEN READLN(INPUT) (*GET OVER POSSIBLE EOL *); IF EOF(INPUT) THEN STAT:=1 (* IF WAS LAST LINE *) ELSE BEGIN WHILE NOT EOLN(INPUT) DO BEGIN (* CREATE A NEW NODE FOR EACH CI~ *) NEW(P);read(input,ch); P'.DATA:=cht IF (BEG=NIL) (* N.B. THERE IS NO *J THEN (* CARRIAGE-RETURN NODE *) BEGIN BEG:=P;EN:=P (* INITIALIZE LIST *) END ELSE BEGIN (* ADD TO AN EXISTING LIST *) EN' . NEXT : =P ; pI • PREV: =EN; pI .NEXT: =NIL; EN:=P END END; 101 %$D-%$L-\(* NO DEBUGGING, NO LIST *) (* PROGRA~ TO ~~E LIST IN CORE OF A LINE OF TEXT AND THEN TYPE IT OUT TO TTY, UNTIL EOF SEEN *) TYPE NODE = PACKED RECORD NEXT: 'NODE PREY: 'NODE DATA: CHAR END; LINK = 'NODE (* (* (* (* DEFINE NODE STRUCTURE *) POINTER TO NEXT NODE IN LIST * ) ; POINTER TO PREVIOUS NODE*); ONE CHARACTER OF LINE * ) (* DEFINE POINTER TO THIS TYPE *); VAR LBEG,LEND: LINK(* POINTERS TO BEGINNING, END OF LIST*); STATUS : INTEGER (*FOR RETURNING STATUS FROM PROCEDURES*); CH : CHAR ( * VARIABLE TO READ CHARACTER INTO *); PROCEDURE READLINE (VAR BEG,EN : LINK; VAR STAT:INTEGER); (* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER VARIABLES BEG AND EN TO POINT TO THE FIRST AND LAST NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED, OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE PROCEDURE. *) VAR P : 'NODE (*A LOCAL POINTER VARIABLE FOR "NEW"*); BEGIN . BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL; IF EOF(INPUT) THEN STAT:=1 (* RETURN EOF *) ELSE BEGIN IF EOLN(INPUT) THEN READLN(INPUT) (*GET OVER POSSIBLE EOL *); IF EOF(INPUT) THEN STAT:=1 (* IF WAS LAST LINE *) ELSE BEGIN ·wHILE NOT EOLN( INPUT) DO BEGIN (* CREATE A NEW NODE FOR EACH CHAR *) NEW ( P ) ; r e ad ( i n p u t , c h ) ; P ' . DATA : =c h t IF (BEG=NIL) (* N.B. THERE IS NO *J THEN (* CARRIAGE-RETURN NODE *) BEGIN BEG:=P;EN:=P (* INITIALIZE LIST *) END ELSE BEGIN (* ADD TO AN EXISTING LIST *) EN' . NEXT: =P ; P' .PREV:=EN; p I • NEXT : =NIL ; EN:=P END END; 102 END(* OF EOF ELSE*); END(* OF EOF ELSE*); END(* OF PROCEDURE*); PROCEDURE WRITELINE (VAR BEG,EN:LINK; VAR STAT:INTEGER); (* SIMILAR TO READLINE, BUT SIMPLER: RETURNS A "1" IF PASSED A NULL LIST, OTHERWISE STAT=O. WRITES OUT THE CHARACTERS OF THE LIST ONE AT A TIME, THEN OUTPUTS A CARRIAGE-RETURN *) VAR POINTER: 'NODE; BEGIN STATUS:=O; IF (BEG=NIL) THEN STATUS:=! ELSE BEGIN POINTER:=BEG; REPEAT WRITE(TTY,POINTER'.DATA); POINTER:=POINTER'.NEXT UNTIL (POINTER=NIL); WRITELN(TTY); END; END; BEGIN ( * BEG IN THE !\'lAIN ROUTINE *) WHILE (STATUS=O) DO BEGIN READLINE (LBEG,LEND,STATUS); IF (STATUS 0) . THEN WRITELN(TTY, 'EOF SEEN') (*HERE IF ERROR RETURN*) ELSE BEGIN WRITELINE (LBEG,LEND,STATUS); IF (STATUS=!) THEN VVRITELN(TTY, 'Line is NIL'); END; END; END. f ' Appendix C CDC Character set 103 104 TABLE B-1. I TIME-SHARING 64-CHARACTER SET CORRESPONDENCE CODE TERMINALtt INTERNAL APL PRINT STANDARD PRINT DISPLAY CODE CODE CODE CODE CODE l CHAR (7-BIT OCTAL (6112 ·BIT OCTAL) CHAR · (B·BIT OCTAL l CHAR. (B·BIT OCTALl CHAR. (7-BIT OCTAL : 00 ttt 121 : 153 276 072 : : 01 171 171 A 101 A 341 A A . 166 02 B 166 B 342 102 B B 03 172 172 c c 143 303 c c 052 04 052 0 104 344 0 0 0 05 112 E 112 145 E E 305 E 06 163 F 163 F F 146 306 F 07 043 043 G G 347 107 G G "046 046 10 H 350 110 H H H I I I 031 031 151 I 311 I I 12 103 103 J 152 J J 312 J 13 032 K 032 353 K K 113 K 14 106 106 L 154 L L 314 L 15 141 M 141 355 M M 115 M 16 122 N 122 356 N 116 N N 17 105 0 105 157 0 3_17 0 0 p 20 p 013 p p 013 360 120 21 133 Q 133 161 Q 321 Q Q 22 051 R 051 162 322 R R R 23 045 045 s 363 s 123 s s 24 002 T 002 164 T 324 T T 25 062 062 u 365 u 125 u u 26 061 v 061 366 v v 126 v 27 165 w 165 167. w w 327 w 30 142 X 142 170 X 330 X X y y 31 y 147 y 147 371 131 124 32 124 372 132 33 144 0 144 060 0 0 060 0 34 040 I 040 261 I 261 I I 35 020 2 020 262 2 2 262 2 ISO 36 3 160 063 3 063 3 3 37 4 004 4 004 264 4 264 4 40 010 5 5 010 065 5 065 5 41 130 6 6 130 066 6 066 6 42 150 7 7 150 267 7 267 7 43 070 070 8 8 270 8 270 8 44 064 064 9 9 071 9 071 9 ASCII CODE TERMINALt APL PRINT STANDARD PRINT + - *I ( I s = t tt ttt 053 055 252 257 050 251 044 275 + - * I ( I $ = z z z z 055 275 120 257 053 252 374 245 + - *I ( I s = I 023 067 070 007 064 144 004 023 + - *I ( I a = 067 067 013 007 153 Ill 171 010 45 46 47 50 51 52 "53 54 I THE OCTAL CODES LISTED FOR ASCII CODE TERMINALS ARE SHOWN WITH EVEN PARITY (NORMAL) THE OCTAL CODES LISTED FOR CORRESPONDENCE CODE TER-MINALS ARE SHOWN WITH ODD PARITY (NORMAL) USE OF THE COLON IN PROGRAM AND DATA FILES WILL CAUSE PROBLEMS. THIS IS PARTICULARLY TRUE WHEN IT IS USED IN PRINT AND FORMAT STATEMENTS. I Appendix D Set Manipulation Program 105 106 (*$D-,L+*) (* Program to demonstrate a method for defining sets consisting of characters. Written on a DEC-10 *) type uletter = 'A' •• 'Z'; digit = '0' .. '9'; alset =set of uletter (*upper case chrs *); var al,all : alset; ch : char; begin al:= 'A' .. 'C'; all:= 'X' .. 'Z' ch:='C'; i f ( c h i n a 1 ) OR ( c h i n a 11 ) t hen wri teln(tty, 'ch is indeed in al or all') else writeln(tty,'in neither set'); if ch in 'A', 'B', 'C' then begin case ch of 'A':writeln(tty,'was an A'); 'B':writeln(tty,'was a B'); 'C':writeln(tty,'was a C') end; end; end. Appendix E Editor Specification Followed By Its RUNOFF Source File 107 108 The consensus concerning editor functions This editor will be line-oriented; therefore, the smallest addressable runount of text is the line. The editor is written in PASCAL. As a result, such common file operations as specifying an input file nrume and an output file possible (that name from within the program are not is, we don't know how to do it, and no one at DIS does either). So for the time being, filenames must be specified at compile time. The commands to this editor consist each, followed of by its parameters, if any. one-character Any characters in parentheses immediately following the one letter command are given as mnemonic interest and should not be typed as part of the command. A list of the agreed-upon commands and conventions follows: 1. 2 The cursor may be thought of as an integer ranging from zero (in the event of an empty buffer only) to "L" where "L" is the number of lines in the buffer. Thus the character $, as mentioned below, has the current value of "L". 0 The period character may be used in the stead of a numeric argument to signify the cursor position (i.e., the current line). 3. $ The dollar-sign may be used in the stead of a numeric argument to signify the last line's address. 4. L(ist) m Print out line 'm' of the buffer onto terminal the 109 5. · L(ist) m,n Print out lines 'm' through buffer onto the terminal 'n' of the 6. P(rint) n Print onto the terminal the current line and the following m-1 lines if m is positive. If m is negative, print the previous m-1 lines in addition to the current line. 7. P(rint) m,n Print onto the terminal the current line and the previous m lines and the following nor n-1 lines, however the implementor decides. 8. T(op) This command positions the cursor first line of the buffer. 9. B(ottom) Position the cursor at the the buffer. last at the line of 10. D(own) Move the cursor to address following the current line. the line 11. U(p) Move the cursor to address previous to the current line. the line 12• K(ill) n Delete (or "kill") n lines starting at the current line. If n not present, delete one line. 13. K(ill) m,n Kill relative to the cursor. So kill the same lines that would be displayed with a P m,n command. 14. I(nsert) Insert mode accepts lines that have at least one character on each one. Each such line is inserted above the cursor as it is read. A line with only one space on it is considered a blank line, and the space is not inserted into the buffer: only the line node signifies the presence of a null line. If the line is of zero length on read-in, insert mode is exited. 15. A(ppend)string Append mode is similar to insert mode except that lines are added to the buffer following the cursor line. 110 16. M(erge) Concatenate the line following the cursor line to the end of the cursor line. 17. S(earch)text The search command attempts to find a match of pattern "text" in the buffer by checking each line, commencing with the line after the cursor line. No matches are possible around end-of-lines. No delimiter is needed. 18. E(nd)string The end-of-line command searches the current line for string "string" and in the event of a successful search, it breaks the line following the string. Breaking the line is defined as making the remainder of the line a new line that follows the current line. In the event the string comprises the last characters on the line, the new line is blank. Thus if we were to type "Ethe " to the program for the line on which "Ethe " appears above, the new line would begin with "program". 19. C(hange)/string1/string2/ The change command is valid only for the current line. The character "/" used above signifies any delimiter character, being by convention the first non-blank character following the "C" command. The useage is to find the first occurrance of stringl and replace it by string2. 20. Qxn This command puts the cursor line and the next n-1 lines following into Q-register "x" where "x" is one of the 26 letters of the alphabet. It's implementation should put a co~x of these lines into the Q-register, not a pointer to these lines. The useage could be to save the content of the lines then kill the lines from the buffer. The lines can be re-inserted into the buffer at a different position with the "G" command. The "n" is optional. 21. G ( e t) xn 22. X The eXit command causes the contents of the buffer to be written to the output This command inserts a copy of the contents of Q-register "x". If "n" is present, this command is ·repeated so as to get n copies in all. The "n" is optional. 111 file and the program terminated. 23. Z(ed) The Zed command causes the program to be terminated without writing the contents of the buffer to be written to the output file. This command is useful if a drastic mistake has been wrought, and the best course of action is to restart editing from scratch. 24. H(elp) Last but not least is the help command. Typing an 'H' to the program causes a brief list of these commands to be output to the screen for the user's benefit. The description of each command should be worded in the spirit of refreshing the user's memory of available commands and not as a tutorial. 112 .autoparagraph .spacing 2 .ju;.fill . lm 0; • rm 59 The consensus concerning editor functions This editor will be line-oriented therefore, the smallest addressable amount of text is the line. The editor is written in PASCAL. As a result, such common file operations as specifyin~ an input file name and an output file name from &within & the program are not possible (that is~ we don't know how to do it, and no one at DIS does either}. So for the time being, filenames must be specified at compile time. The commands to this editor consist of one-character each, followed by its parameters, if any. Any characters in parentheses immediately following the one letter command are given as mnemonic interest and should not be typed as part of the command. A list of the agreed-upon commands and conventions follows: .no justify .list 1 • lm 0; • rm 59 • lm 10 • rm 52 • 1e The cursor may be thought of as an integer ranging from zero (in the event of an empty buffer only) to "L" where "L" is the number of lines in the buffer. Thus the character $, as mentioned below, has the current value of "L" . • lm 9; • rm 52 .le . :-1m 10 • rm 52 The period character may be used in the stead of a numeric argument to signify the cursor position (i.e., the current line) • • lm 0; • rm 59 . lm 9 • rm 52 .le $ :-1m 10 • rm 52 The dollar-sign may be used in the stead of a numeric argument to signify the last line's address . • lm 0; • rm 59 • lm 9 .le L(ist) m • lm 10 • rm 52 Print out line 'm' of the buffer onto the terminal 113 • lm 0; • rm 59 • lm 9 .le L( ist) m,n .lm 10 .rm 52 Print out lines 'm' through 'n' of the buffer onto the terminal .lm O;.rm 59 .lm 9 • 1e P(rint) n .lm 10 • rm 52 Print onto the terminal the current line and the following m-1 lines if m is positive. If m is negative, print the previous m-1 lines in addition to the current line • • lm O;.rm 59 • lm 9 .le P(rint) m,n • lm 10 . rm 52 Print onto the terminal the current line and the previous m lines and the following nor n-1 lines, however the implementor decides • . lmO;.rm59 .lm 9 .le T(op) .lm 10 . rm 52 This command positions the cursor at the first line of the buffer • • lm O;.rm 59 • lm 9 • 1e B(ottom) .lm 10 .rm 52 Position the cursor at the last line of the buffer . • lm 0; • rm 59 • lm 9 .le D(own) .lm 10 .rm 52 Move the cursor to address the line following the current l i ne • . lm 0; • rm 59 • lm 9 .le U(p) . lm 10 • rm 52 Move the cursor to address the line previous to the current line. 114 • lm 0; • rm 59 .lm 9 .le K(ill) n .lm 10 • rm 52 Delete (or "kill") n lines starting at the current line. If n not present, delete one line • • lm O;.rm 59 • lm 9 .le K(ill) m,n .lm 10 • rm 52 Kill relative to the cursor. So kill the same lines that would be displayed with a P m,n command • • lm 0; • rm 59 • lm 9 .le I(nsert) • lm 10 .rm 52 Insert mode accepts lines that have at least one character on each one. Each such line is inserted above the cursor as it is read. A line with only one space on it is considered a blank line, and the space is not inserted into the buffer: only the line node signifies the presence of a null line. If the line is of zero length on read-in, insert mode is exited • • lm 0; • rm 59 . lm 9 .le A(ppend)string • lm 10 .rm 52 Append mode is similar to insert mode except that lines are added to the buffer following the cursor line • • lmO;.rm59 . lm 9 .le M(erge) .lm 10 • rm 52 Concatenate the line following the cursor line to the end of the cursor line • . lm 0; • rm 59 .lm 9 .le S(earch)text .lm 10 • rm 52 The search command attempts to find a match of pattern "~ext" in ~he buffer by checking each line, commencing with the line after the cursor line. No matches are possible around end-of-lines. No delimiter is needed • • lm 0; • rm 59 • lm 9 .le 115 E(nd)string .lm 10 .rm 52 The end-of-line command searches the current line for string "string" and in the event of a successful search, it breaks the line following the string. Breaking the line is defined as making the remainder of the line a new line that follows the current line. In the event the string comprises the last characters on the line, the new line is blank. Thus if we were to type "Ethe#" to the program for the line on which "Ethe#" appears above, the new line would begin with "program" • • lm 0; . rm 59 .lm 9 .J e C(hange)/string1/string2/ .lm 10 • rm 52 The change command is valid only for the current line. The character "/" used above signifies any delimiter character, being by convention the first non-blank character following the "C" command. The useage is to find the first occurrance of string1 and replace it by string2 . • lm 0; • rm 59 . lm 9 • 1e Qxn .1m 10 .rm 52 This command puts the cursor line and the next n-1 lines following into Q-register "x" where "x" is one of the 26 letters of the alphabet. It's implementation should put a t&copy\& of these lines into the Q-register, not a pointer to these lines. The useage could be to save the content of the lines then kill the lines from the buffer. The lines can be re-inserted into the buffer at a different position with the "G" command. The "n" is optional • • lm 0; • rm 59 • lm 9 • 1e G( e t) xn • lm 10 • rm 52 This command inserts a copy of the contents of Q-register "x". If "n" is present, this command is repeated so as to get n copies in all. The "n" is optional • • lmO;'.rm59 • lm 9 .le X .lm 10 • rm 52 The eXit command causes the contents of the buffer to be written to the output file and the progrrun terminated . • lm 0; • rm 59 • lm 9 116 • 1e Z(ed) .lm 10 .rm 52 The Zed command causes the program to be terminated without writing the contents of the buffer to be written to the output file. This command is useful if a drastic mistake has been wrought, and the best course of action is to restart editing from scratch • • lm O;.rm 59 .lm 9 .le H(elp) .lm 10 .rm 52 Last but not least is the help command. Typing an 'H' to the program causes a brief list of these commands to be output to the screen for the user's benefit. The description of each command should be worded in the spirit of refreshing the user's memory of available commands and not as a tutorial . • els Appendix F "FINDL" Pattern-Matching Program 117 118 (* FILE FINDL.PAS ALGORITHM 'FIND' USING A LIST FOR PATTERN STRING *) (*$D+,l-*) (* SET DEBUGGING, LIST OPTION SWITCHES INSIDE COMMENT *) ( * PROGRAM TO TEST THE SEARCH ALGORITHM "FIND" (SEE BELOW). (* note that a "'" replaces an up-arrow in this listing as that character is not on print-wheel of the diablo *) SEARCHES A LIST IN CORE OF A LINE OF TEXT. *) TYPE NODE = PACKED RECORD (* DEFINE NODE STRUCTURE *) (*POINTER TO NEXT NODE*); NEXT: 'NODE PREV: 'NODE (*POINTER TO PREVIOUS NODE*); ( * ONE CHARACTER OF LINE * ) DATA: CHAR END; (*DEFINE POINTER TO THIS TYPE*); LINK = 'NODE VAR LBEG,LEND : LINK POS,PATPNT,PATHEAD STATUS : INTEGER CH : CHAR (* POINTERS TO BEGINNING AND END OF LIST*); LINK; . ( * STATUS FROM PROCEDURES *); ( * VARIABLE TO READ CHAR INTO *); PROCEDURE READLINE (VAR BEG,EN : LINK; VAR STAT:INTEGER); (* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER VARIABLES BEG AND EN TO POINT TO THE FIRST AND LAST NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED, OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE PROCEDURE. * ) VAR P : 'NODE (*A LOCAL POINTER VARIABLE FOR "NEW" *); BEGIN BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL; IF EOF(INPUT) THEN STAT:=! (* RETURN EOF *) ELSE BEGIN IF EOLN(INPUT) THEN READLN(INPUT) (*GET OVER POSSIBLE EOL *); IF EOF(INPUT) THEN STAT:=! (* IF WAS LAST LINE *) ELSE BEGIN WHILE NOT EOLN( INPUT) DO BEGIN (* CREATE A NEW NODE FOR EACH CHAR *) NEW ( P ) ; r e ad ( i n p u t , c h ) ; P ' . DATA : =c h ~ IF (BEG=NIL) (* N.B. THERE IS NO *J THEN (* CARRIAGE-RETURN NODE *) BEGIN BEG:=P;EN:=P (* INITIALIZE LIST *) 119 END ELSE BEGIN (* ADD TO AN EXISTING LIST *) EN I • NEXT: =p; P 1 .PREV:=EN; pI .NEXT: =NIL; EN:=P END END;. END(* OF EOF ELSE*); END(* OF EOF ELSE*); END(* OF PROCEDURE*); PROCEDURE FIND(S:LINK;PAT:LINK;VAR I:LINK); (* IMPLEMENTATION OF ALGORITHM "FIND" FRONI HORROWITZ AND SAHNI. FIND IN STRING S THE FIRST OCCURRENCE OF STRING PAT AND RETURN I AS A POINTER TO THE NODE IN S WHERE PAT BEGINS; OTHERWISE RETURN I AS NIL. *) VAR SAVE,Q,P : LINK; BEGIN IF ((PAT=NIL)OR(S=NIL)) THEN I: =NIL ELSE (* EXECUTE REST OF ROUTINE ONLY IF NOT NIL *) BEGIN I:=NIL (* ~~E SURE I IS NOT VALID ON ENTRY*); P:=S; REPEAT SAVE:=P; Q:=PAT (*SAVE THE STARTING POSITION*); ( * COi\1MENT OUT DEFECTIVE EXAMPLE CODE "~ILE ((NOT(P=NIL))AND(NOT(Q=NIL))AND (P 1 .DATA = Q' .DATA)) DO BEGIN THIS CODE IS INCLUDED AS AN p : =p I • NEXT; EXAMPLE-- IT DOESN I T WORK; Q: =Q I • NEXT ; END· *) ' WHILE NOT ((P=NIL)OR(Q=NIL)) DO BEGIN IF (P' .DATA=Q' .DATA) THEN BEGIN p : =p I • NEXT; Q: =Q I • NEXT; END ELSE P:=NIL; END; IF (Q=NIL) THEN 120 BEGIN P:=NIL; I:=SAVE (* SUCCESS! P IS SET TO NIL TO STOP LOOP *) END ELSE P:=SAVE'.NEXT (*OTHERWISE CHECK STRING BEGINNING AT NEXT CHAR *); UNTIL (P=NIL); END (* OF ELSE ASSOCIATED WITH "IF ((PAT=NIL)OR(S=NIL))" *); END(* OF PROCEDURE 'FIND' *); BEGIN (* BEGIN THE MAIN ROUTINE *) NEW(PATPNT) (*CONSTRUCT A PATTERN IN A LIST*); PATPNT'.DATA:='F'; PATHEAD:=PATPNT; NEW( PATPNT); PATPNT 1 • PREY: =PATI-IEAD ; PATHEAD 1 • NEXT: =PATPNT; PATPNT I • DATA: = I 0 I ; NEW(PATPNT); PATHEAD'.NEXT'.NEXT:=PATPNT; PATPNT'.PREV:=PATHEAD 1 .NEXT; PATPNT 1 .DATA:='0' (*DONE CONSTRUCTING PATTERN*); READLINE (LBEG,LEND,STATUS); IF (STATUS 0) THEN WR I TELN (TTY, ' EOF SEEN ' ) ( * HERE IF ERROR RETURN * ) ELSE BEGIN FIND(LBEG,PATHEAD,POS); IF NOT(POS = NIL) THEN WRITELN(TTY' I FOUND IT I) ELSE WRITELN(TTY, 'NOT FOUND'); END; END. Appendix G "FINDA" Pattern-Matching Program 121 122 (* FILE FINDA.PAS -- ALGORITHM 'FIND' USING AN ARRAY FOR THE PATTERN STRING *) (* note that a "'" replaces an up-arrow in this listing as that character is not on print-wheel of the diablo *) CONST MAXPAT = 50; TYPE NODE = PACKED RECORD (* DEFINE NODE STRUCTURE *) NEXT: 'NODE (*POINTER TO NEXT NODE*); PREV: 'NODE (* POINTER TO PREVIOUS NODE*); ( * ONE CHARACTER OF LINE *) DATA: CHAR END; LINK = 'NODE O (* DEFINE POINTER TO THIS TYPE *); PATTERN = ARRAY 1 .. MAXPA1j OF CHAR; MAXLEN = 1 .. MAXPAT; VAR LBEG,LEND : LINK (* POINTERS TO BEGINNING AND END OF LIST*); PATARR : PATTERN; PATLEN : MAXLEN; POS,PATPNT,PATHEAD : LINK; STATUS : INTEGER ( * RETURN FROM PROCEDURES * ) CH : CHAR ( * VARIABLE TO READ CHR INTO *); PROCEDURE READLINE (VAR BEG,EN: LINK; VAR STAT:INTEGER); (* READLINE CONSTRUCTS A LINKED LIST AND SETS POINTER V~niABLES BEG AND EN TO POINT TO THE FIRST AND LAST NODES OF A LIST IF EOF IS ENCOUNTERED BEFORE ANY CHARACTER, STATUS IS SET TO 1 AND A NIL LIST IS RETURNED, OTHERWISE STATUS=O. THIS ROUTINE READS PAST THE END-OFLINE ON ENCOUNTERING IT AT THE BEGINNING OF THE PROCEDURE. *) VAR P : 'NODE (*A LOCAL POINTER VARIABLE FOR "NEW" *); BEGIN BEG:=NIL; EN:=NIL; STAT:=O; P:=NIL; IF EOF(INPUT) THEN STAT:=1 (* RETURN EOF *) ELSE BEGIN IF EOLN(INPUT) THEN READLN( INPUT) (* GET OVER POSSIBLE EOL *); IF EOF(INPUT) THEN STAT:=1 (* IF WAS LAST LINE *) ELSE BEGIN 'WHILE NOT EOLN( INPUT) DO BEGIN ( * CREATE A NEW NODE FOR EACH CHAR *) NEW ( P ) ; r e ad ( i n p u t , c h ) ; P ' . DATA : =c h t IF (BEG=NIL) (* N.B. THERE IS NO *J THEN (* CARRIAGE-RETURN NODE *) BEGIN 123 BEG:=P;EN:=P (* INITIALIZE LIST *) END ELSE BEGIN (* ADD TO AN EXISTING LIST *) EN I • NEXT : =p ; P 1 .PREV:=EN; P' .NEXT:=NIL; EN:=P END END; END(* OF EOF ELSE*); END(* OF EOF ELSE*); END(* OF PROCEDURE READLINE*); PROCEDURE FIND(S:LINK;PAT:PATTERN; VAR I:LINK;PLENGTH:MAXLEN); {* ' AN ALTERNATE IMPLEMENTATION OF ALGORITHM "FIND" FRaVI HORROWITZ AND SAHNI. AND FIND IN STRING S THE FIRST OCCURRENCE OF STRING PAT RETURN I AS A POINTER TO THE NODE IN S BEGINS; OTHERWISE RETURN I AS NIL. ~~ERE PAT *) VAR SAVE,P : LINK; Q: 1 .. MAXPAT; BEGIN IF ((PLENGTH=O)OR(S=NIL)) THEN I: =NIL ELSE BEGIN (* ATTEMPT A STRING IF STRING( 2) I:=NIL P:=S; MATCH STARTING WITH BEGINNING OF EACH NO MATCH, TRY TO l.VIATCH PAT( 1) WITH AND SO ON *) (*SET TO NIL IN CASE WE \VERE PASSED A NON-NIL VALUE*); REPEAT SAVE:=P; Q:=1 (*SAVE THE STARTING POSITION*); WHILE NOT ({P=NIL)OR(Q)PLENGTH)) DO BEGIN IF {P 1 .DATA=PAT[Q]) THEN BEGIN p : =p I • NEXT ; Q: =Q+1; END ELSE P:=NIL; END; IF (Q)PLENGTH) (* CHECK TO SEE IF WE EXAUSTED 124 PATTERN LENGTH *) THEN BEGIN P:=NIL; I:=SAVE (* IF SO, WE HAVE A MATCH*) END ELSE BEGIN P:=SAVE' .NEXT (* OTHERWISE, CHECK STRING BEGINNING WITH NEXT CHAR *) END; UNTIL (P=NIL); END (* OF THE ELSE ASSOCIATED WITH 'IF ((PLENGTH=O)OR(S=NIL)}' *); END(* OF FIND*); BEGIN (* BEGIN THE MAIN ROUTINE *) (* ARTIFICIALLY LOAD PATTERN ARRAY WITH A STRING *} PATARR(l} :='F'; PATARR(2] :='0'; PATARR (3] :='0' · PATLEN:=3 (*SEARCH TAHGET STRING FOR 'FOO' *i; READLINE (LBEG,LEND,STATUS)(* GET FIRST LINE OF THE FILE I INPUT I*) ; IF (STATUS 0) THEN WR I TELN (TTY, ' EOF SEEN ' ) ( * HERE IF ERROR RETURN * ) ELSE BEGIN FIND(LBEG,PATARRtPOS,PATLEN); IF NOT(POS = NILJ THEN WRITELN(TTY,'FOUND IT') ELSE WRITELN(TTY,'NOT FOUND'); end; END. , Appendix H Flow of RUNOFF Justification 125 126 j-.y SoaceJ o, Lt l"'e7 r~ 1~0 I I I l_ _____..,. 'J ,( sp•'-«> •'• w,, J.t.J ~foe /,ff•( I-f• /,;,,, Cx<P:> 1 1'1UMJ,,.. ,/-' f,:.,.~ w>t aJ./ ,_. ~ &drf,.L $..0.:. c.e; 4Y ~,_.,N ,_~ a UIViVr r~• u.o ~ ""'~ IJy J~h/'~ lhJt:DA.tD/f'JJn~t/'lt' tiJ.. 1 ,r ;:-:::::::::::=:: :::::-:::::::::::::::::::: ::.:;:::::: :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: ~.; ~~AC1Ll b ~.·:.:...c..o:.la.t!Q -.lS oo .................... _ ................ ,.I I I I I I I I I I I I I A ~ :0 I I .I I ____ JI tl~/,,:;'1 a.,. ~-rr~ ~t7•CC. Appendix I Text Processing Midterm and Final 127 128 Text Processing Wi 11 i ams cs Midterm 361 April 4 1979 This is a closed book, closed notes test. Please answer all questions in the space provided on the test. The following are short essay questions to be answered as concisely as possible. A simple "Yes" or "No" is too concise. 1. What is meant by a meta character? 2. What is the main difference between TECO line editor (in terms of addressing)? 3. Why is it necessary to type "271" to TECO in order to insert an alt-mode (symbolized by $)? and a 4. J(:SProcedure P'X$;0tt.-z;)$$ What does this macro do? 5. In TECO, what is a reasonable way to from other macros? 6. TECO usually requires an alt-mode to delimit commands. Please specify what genres of command require such a delimiter (in any editor). 7. In the text editor we specified, what convention would need to be changed to insert a line below the last line in the buffer if the "Append" command didn't exist? 8. In our .PASCAL, why would it be necessary to implement a separate dispose routine for different record types? call macros 129 9. It has been alle~ed that our editor is easier to implement with linked lists. How would you defend this thesis? 10. Why is it necessary to know the pattern string when simple implemented with an array (as linked-list)? 11. How would you change the editor specified in class to allow the user to type multiple commands on one line? 12. Why is it so easy for TECO to allow a user to type 'N (i.e., a "control-N") before any one of its other search-string-specifying commands, when that following command may be arbitrarily complex? 13. In many searching algorithms, a pass is made over the pattern strin~ to "compile" it into an internal representation. Why? 14. What is a major drawback of algorithm implemented in TECO? 15. In many text formatters, a distinction is made between the line length and the line width of the output line. Define these terms. 16. In formatted text: a) What is a "river"? b) How is it avoided? length of the searching is opposed to a the searching 130 17. The Software Tools formatter provides a facility to oufpuf-fhe-page number either in the header or the footer, in any character position. How is this accomplished? 18. In respect to text formatters: 19. Assume you are told to install a "figure" command in a text formatter that allows enough space for a figure of a certain size. a) Give the exact format of the command (for the user) b) 20. What is a "Break"? Specify how you would implement it. One way to implement a control structure in the editor is to use a case statement to recognize which command has been typed. a) What happens if the user types an illegal cornnand? b) How would you protect yourself from this? 131 21. The following is a PASCAL attempt to implement the previous question. What is wrong with it? (Give PASCAL language faults.) Note that Teletypeinputroutine is a function that returns the next character from the teletype, and that the i den. t i f i e r s end i n g i n "command" are p r o c e d u r e s . Finding 4 faults is sufficient for full credit. ~!£~~g~!~ Commandloop; Legalcommand = (A,B,C,D,E,Z); LC : Legalcommand; CH : char; STATUS: integer; TTYSTAT:=O; STATUS=O; CH := Teletypeinputroutine; IF CH IN LC THEN BEGIN ----CASE CH OF A: B: C: D: E: Z: ~~; Appendcommand; Bottomcommand; Changecommand; Downcommand; Endoflinecommand; Quitcommand(sets "status"); 132 Text Processing Williams cs Final Exam 361 May 21, 1979 This is a closed book, closed notes test. Please answer all questions in the space provided on the test. The following are short essay questions to be answered as concisely as possible. A simple "Yes" or "No" is too concise. 1. What is meant by a meta character? 2. What are the two most significant differences betwen RUNOFF and a composition program for a typesetter (like PAGE-3)? a) b) 3. Why is it not possible to make the declaration in NOS PASCAL? TYPE Charset = SET OF CHAR; following 4. The matrices of many pre-TTS typesetters did not have widths relative to each other. How was justification achieved on these machines? 5. What is meant by the of type measurement? 6. What is meant by photocomposition? 7. In-regards to typeface design, what by "x-height"? "relative-width "base-line system" alignment" is in meant 133 8. What are two options available to a printer (or composition programmer) in eliminating a "widow" line? a) b) 9. What is a "discretionary hyphen" and its use? 10. What is meant by descendent from? 11. a) What is "kerning"? "TTS" b) How is it accomplished cold typesetting? and in what either what is is it hot or 12. What is "mortising" (in printing)? 13 . Wh a t is t h e rna i n third-generation typesetter? 14. What distinguishes a typesetter from others? 15. What are two methods for sizing (making larger or smaller) characters in a phototypesetter? a) b) difference between a and second-generation fourth-generation 134 16. Many fonts are measured in parts of an "em". That is, an "em-space" may measure 18 units, while an "en" space only nine units, regardless of point size. Please give the conditions under which it is necessary to convert from this system of measurement inside a typesetter and why. 17. Give three methods of storing width values (of characters) on typesetters (not just third-generation). a) b) c) 18. What is meant by a perimeter character? 19. What is meant by "run length encoding" and how is it used? 20. In justifying a line, there are two ways to manipulate the amount of space occupied by the words on that line. What are they? 21. What is meant by "leading"? 22. What is vertical justification and it be used? why would 135 23. In hyphenation: a) What is meant by "cleaning up the word" and why is it done? b) Why would it be characters within a up"? necessary to examine word while "cleaning it 24. What are the two main accomplish hyphenation? methods used to 25. What is "vertical justification"? 26. In hyphenating a word, some procedures attempt to hyphenate by beginning from the right end of the word. Why would this be preferred to working from the left? 27. What is meant by a "homograph"? 28. What is an exception dictionary and how is used? 29. What is a common method of ensuring the correct spelling for input to a typesetter? 30. The early composition programs set 3-column text by reading the first line for each column from a file by random-access, then settin~ the first line of each column with these lines. a) Why was this necessary? it b) What problems were discovered in using this procedure? 136 31. What were two advantages of Selectric Typewriter? the IBM Magtape a) b) 32. What is the difference between an typewriter and a word processor? electronic 33. What is meant by a display system? standalone 34. What are two advantages of station in word processing? 35. What is a "boilerplate"? 36. What is a shared-logic word processor? 37. What is the main disadvantage shared-logic word processor? 38. What is meant by a "data processing interface" in relation to word processing? 39. What are two advantages of processors? 40. How do word processing companies. justify relatively high cost of a word processor? 41. How does an Electronic Message from a computer network? "thin window" a "dual media" of shared-logic System a word the differ 137 42. \Vhat are two methods of managing Electronic Message Systems? 43. What are four reasons that Electronic Message System? would messages in justify an a) b) c) d) 44. What is meant by a "passive" network?