\begindata{text,538567440} \textdsversion{12} \template{default} \define{global } \define{concat menu:[Font,Concat] attr:[Script PreviousScriptMovement Point 6] attr:[FontSize PreviousFontSize Point 2]} \define{sans menu:[Font,Sans] attr:[FontFamily AndySans Int 0]} \flushright{16 July 1993} \majorheading{Ness: A Short Tutorial} \center{by Wilfred J. Hansen} \indent{\flushright{\smaller{_________________________________ Andrew Consortium Carnegie Mellon University __________________________________ }} Ness is a programming language for the Andrew ToolKit. With it, documents can be processed and can even contain active elements controlled by Ness scripts. The language features an innovative substring algebra for string processing. \ This document is a tutorial description of the language itself. If you are familiar with other programming languages, you may still benefit from reading the sections of this manual that describe the string algebra. For a description of other Ness documentation, see \bold{Ness User's Manual}. 1. String Algebra 5. An Example Appendix: /usr/andrew/lib/ness/demos \smaller{ }}\smaller{\bold{\ \begindata{bp,538539008} \enddata{bp,538539008} \view{bpv,538539008,6,0,0}} Copyright IBM Corporation 1988, 1989 - All Rights Reserved Copyright Carnegie Mellon 1993 - All Rights Reserved \ \smaller{ $Disclaimer: Permission to use, copy, modify, and distribute this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice, this permission notice, and the following disclaimer appear in supporting documentation, and that the names of IBM, Carnegie Mellon University, and other copyright holders, not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. IBM, CARNEGIE MELLON UNIVERSITY, AND THE OTHER COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL IBM, CARNEGIE MELLON UNIVERSITY, OR ANY OTHER COPYRIGHT HOLDER BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. $ $Log: nesstut.doc,v $ Revision 1.11 1994/02/08 22:05:14 rr2b 04:31:45 wjh typescript -> command DOC Revision 1.10 1993/07/17 remove notes about future developments modernize layout, copyrights, etc. } Revision 1.6 1991/07/09 14:06:11 wjh \ fixed invisible index \ fixed search example Revision 1.5 1990/10/31 14:42:34 wjh \ added a compute-all-subsets example \ }\ \begindata{bp,538539072} \enddata{bp,538539072} \view{bpv,538539072,7,0,0} Ness programs are a collection of definitions of variables and functions. For instance: \example{function main() \leftindent{printline ("Hello, world")} end function} When executed, this program will print "Hello, world" in the command window. To execute a Ness program from a command window, give the nessrun command. If the text above is in file hello.n, the command would be \example{nessrun hello.n} Nessrun compiles the program into an internal form and executes it by calling the function main(). The name \italic{main} is magic--Nessrun always calls it first. \ If you want to pass parameters to main(), you can write a string after the name of the Ness program file: \example{nessrun hello.n This is a string} The string after the file name is passed as an argument to main(). access it, main needs to declare an argument: To \example{function main(arg) \leftindent{printline(arg)} end function} The printed output is a single line saying \example{This is a string} Ness programs can do arithmetic. few Fibonacci numbers: \example{ function main() \leftindent{integer i Here is a program to print the first i := 0 while i <= 12 do \leftindent{printline(textimage(Fibonacci(i))) i := i + 1} end while} end function; integer function Fibonacci (integer x) \leftindent{if x <= 1 then \ \leftindent{return x} else \leftindent{return Fibonacci(x-1) + Fibonacci(x-2)} end if} end function} With the above in file /tmp/fib.n, a typical execution would look like this: \example{% \bold{nessrun /tmp/fib.n} Compile okay. 0 1 1 2 3 5 8 13 Elapsed time is 4360 msec. 21 34 55 89 144 Execution okay. Elapsed time is 280 msec.} Items to note about the program: * The \bold{for} statement is currently not implemented, so it has to be simulated with a \bold{while} statement as shown in main(). * Since the Fibonacci function returns an integer value, it cannot be printed directly; it must be converted to a string value by passing it to the function textimage(), which converts any value to an appropriate string. (Plans are in the works for letting print() and printline() directly convert integer values to strings.) * Function arguments and return values are by default marker values, so the types must be explicitly declared \bold{integer} for Fibonacci() and its argument x. * The algorithm shown is highly inefficient; consider how many times Fibonacci(1) is computed. The reader is invited to write a better version of this program and try it out. \section{1. String Algebra} String values in Ness are formally known as \italic{marker} values because such a value does not actually denote a string, but rather refers to, or marks, a substring of an underlying base string. A program can declare marker variables: \example{\smaller{\italic{marker} s, m} } and assign them references to constants: \example{\smaller{s := "abcdefg" m := s}} Now s refers to the entirety of the base string a b c d e f g . And m refers to the same. This is a fundamental property of assignment of marker values: m \italic{does not} refer to a copy of \typewriter{s}--it refers to the \italic{same} base string as \typewriter{s}. \ Four primitive functions start(), next(), base(), and extent() are provided for manipulating string values. (Many additional functions are defined in terms of these primitives.) \bold{start(x)} - The value of start of x is a marker refering to a position between two characters. In particular, the value refers to the position just before the start of the value referred to by x. Suppose x refers to the substring c d e of the value a b c d e f g above, then the value of start of x is a marker for the empty position between b and c. \bold{base(x)} - Returns a marker for the entire base string of which x refers to a part. Suppose again that x refers to the substring c d e of the value a b c d e f g , then the value of base(x) is a marker referring to the entire value a b c d e f g . To get an empty mark at the beginning of the underlying base string for marker p, you can use the composition of the functions start and base: start(base(p)). The opposite composition, base(start(p)), returns exactly the same value as base(p) because p and start(p) both refer to the same underlying base string. \bold{next(x)} - This function returns a marker for the character following the value marked by x. When x is c d e , next(x) is a marker for f on the same base string. If the argument x extends all the way to the end of its base string, then next(x) returns an empty mark for the position just after the last character of x. \ Now we can write expressions for more interesting subsequences relative to a given mark t. The first character \italic{after the beginning of t} is next(start(t)) because start(t) computes the empty marker just before the first character and next() computes the character just after its argument. Note the careful inclusion of the phrase "after the beginning of t"; next(start(t)) does not compute the first character of t if t is a marker for an empty substring--such a marker has no characters and thus certainly no first character. Similarly, the second character after the beginning of a given marker t is next(next(start(t))) and the first character of the base is next(start(base(t))). \bold{extent(x, y)} - Computes a marker from two other markers on the same base. The new marker value extends from the beginning of the first argument, x to the end of the second argument, y. So if x is c d e in a b c d e f g and y is b c d e f then extent(x, y) will extend from just before c, the first character of x, until just after f, the last character of y; the value will be c d e f. The two arguments need not overlap or even be contiguous; if x is b c and y is e f the value of extent(x,y) is b c d e f. \ It may be that the first argument of extent() begins after the second ends. In this case extent() is defined to return the a marker for the empty position just after the \italic{second} argument. Note that this position comes \italic{before} the beginning of the first argument. Another possible problem with the arguments to extent() is that they may be on different base strings. In this case, the value returned is a marker on a unique base string whch has no characters. With the aid of extent() we can write even more useful functions. Suppose we really want the first character of a marker; the result should be empty if the argument marker is empty. We approach indirectly by first finding an expression for all characters after the first character of a marker value. Let t be the marker value and assume for the moment it has at least one character. From earlier work we know that the first character is given by next(start(t)) and the character after that by next(next(start(t))). Now the sequence of all characters after the first character begins with that second character and extends to the end of t, so we can write the function rest(): \example{\smaller{\italic{function} \bold{rest}(t) \leftindent{\italic{return} extent(next(next(start(t))), t)} \italic{end function}}} Now imagine that the argument t for rest() has no characters. The first argument to the extent function will be a marker which begins after the end of t, the second argument; so the result of the extent() will be the end of that second argument. But since t is empty, the end of t is identical to t itself and the function rest() applied to an empty marker returns a marker equal to its argument. With the aid of rest() we can compute the first character of a marker: it is the marker which extends from the beginning of the original marker up to the beginning of the rest(). \example{\smaller{\italic{function} \bold{first}(t) \leftindent{return extent(t, start(rest(t)))} \italic{end function}}} It may seem inefficient to compute first() and rest() by such round-about means, and it is. These are simply formal definitions in terms of the four primitives start(), base(), next(), and empty(); in practice first() and rest() can be implemented more efficiently by writing them in the same low level language as is used for the primitives. In Ness we can represent a set as a sequence of its elements and write an algorithm for all subsets. The result is a sequence of subsets, each in parentheses. For example, if the input is abc the result is (abc)(ab)(ac)(a)(bc)(b)(c)(): \example{\smaller{\italic{function} \bold{subsets}(s) \italic{if} s = "" \italic{then return} "()" subset -- the empty \italic{else} \ \italic{marker} m := subsets(rest(s)) -- all subsets of all but first element \italic{return} combine(first(s), m) ~ m -- concatenate: -- all subsets of rest(s) with first(s) inserted in each -- and all subsets of the rest(s) \italic{end if end function function} \bold{combine}(t, s) \italic{if} s = "()" \italic{then return} "(" ~ t inserted in empty subset ~ ")" -- t \italic{else} \ \italic{marker} m := extent(s, search(s, ")")) -- first subset \italic{return} "(" ~ t ~ rest(m) first subset ~ combine(t, allnext(m)) subsets -- t inserted in -- t inserted in remaining \italic{end if end function} }} The second function, \bold{combine}, takes an element and a sequence of subsets and forms a new sequence of subsets with the element inserted in each subset. This is utilized by the first function which initially splits its argument into a first element and a remainder, then finds all subsets of the remainder, and finally concatenates this list of subsets with the list generated by combining the first element with the subsets from the remainder. \section{2. Search} Search functions in Ness search through the base of their first argument looking for some substring described by the second argument. The return value is a marker referring to a substring of the first argument. The sequence \smaller{\example{\italic{marker} m := "abcdef" m := search(m, "cd") printline(m)}} will print \bold{cd}, but it will print a copy of the \bold{cd} within \italic{m}. So if other string operations are applied to the result of the search, they will locate other text within \italic{m}. Thus \smaller{\example{\italic{marker} m := "abcdef" \italic{marker} s s := search(m, "cd") printline(extent(next(s), m)) }} will print \bold{ef}, the characters starting with \italic{next(s)} and extending to \italic{finish(m)}. \ By convention, the string searched is delimited by the precise details of the first argument: If it is a zero-length substring, the search extends from it to the end of the base; but if non-zero, the search occurs only within the substring delimited by that first argument. For a search that does not succeed, the value returned is a zero-length marker at the position of the end of the first argument. Thus a first argument of start(x) differs from one of toend(x) in the location of the empty marker returned for a failing search. Two of the search functions, \italic{search}() and \italic{match}() treat their second argument as a string to be matched exactly in the first (white space and the case of characters must all match exactly). The other three treat their second argument as a set of characters. For instance, in the expression \example{\smaller{anyof(toend(x), ".?!")}} the set has three characters and the expression will return a marker for the first character in base(x) after start(x) which is a period, question mark or exclamation point. If there is no such character, the expression returns an empty marker at the end of toend(x), which is the end of base(x). \indexi{search()}\description{\italic{search}(x, pat) - find the first instance of pat after start(x). \indexi{match()}\italic{match}(x, pat) - match pat only if it begins at start(x). \indexi{anyof()}\italic{anyof}(x, set) - find the first (single) character after start(x) that is one of the characters in the set. \indexi{span()}\italic{span}(x, set) - match start(x) and all contiguous subsequent characters which are in the set. \indexi{token()}\italic{token}(x, set) - find the first substring after start(x) which is composed of characters from set. \ } As an exercise in full understanding of the search conventions, here is a definition of token() written in Ness: \example{\smaller{\italic{function} \bold{token}(x, set) \leftindent{\italic{marker} m := anyof (x, set) \italic{if} m = "" \italic{then} \ \leftindent{-- there is no element of set -- in the search area delimited by x} \leftindent{\italic{return} m} \italic{end if if} x = ""\italic{ then} \leftindent{-- we have to check from m to the end of base(x) \italic{return} span(toend(m), set)} \italic{else }\ \leftindent{-- the token must be between m and the end of x \italic{return} span(extent(m, x), set)} }\italic{\leftindent{end if} end function}}} \section{3. Concatenation} Two string values can be concatenated with the tilde operator: \concat{~} \ printline("first" ~ "second") will print "firstsecond"; note that there is no space. We can write another fibonacci function which generates a string of a fibonacci length: \example{\smaller{\italic{function} \bold{fibonacci}(integer n) \leftindent{\italic{if} n < 2 then return "*" \italic{end if} print(textimage(n) ~ " ") \italic{return} fibonacci(n-2) ~ fibonacci(n-1)} \italic{end function} }} From the call printline(fibonacci(5)), the value printed is 5 3 2 4 2 3 2 ******** Concatenation always generates a new string value. the definition of the copy() function: Indeed it is part of \example{\smaller{\italic{function} copy(x) \leftindent{\italic{return} x ~ "" -- to copy x, concatenate it to the empty string} \italic{end function}}} \section{4. Replacement} In most applications it is sufficient to process an input string or file and construct the output by concatenation of the pieces of the input with other constants as needed. Sometimes, however, for example when modifying the text displayed in an editor, it is prefereble to replace a piece of the text with some other text. For this purpose Ness offers the \italic{replace}() function. \ Replace(a, b) modifies the base string of a so the portion that was delimited by a now contains b. The value returned is a marker value referring to the new contents of the portion of a; even though the contents are a copy of b, they are not in the same base as b. Thus the statements \leftindent{marker m := copy("abcdef") replace(next(next(start(m))), "QWE") } will give to m the value aQWEcdef. replace Note that the first argument of cannot be a constant, so copy() was used to make a modifiable copy of "abcdef". When a replace of marker x is made within a base string, base(x), something must be done about all other markers that refer to portions of base(x). Those that start before x and end after it will have there contents modified to include the new replacement characters. Those that start after the end of x must be modified so they refer to the same characters as they did before. More interesting cases occur for existing markers which overlap x. In general the replacement is made as though the new text is inserted at the end of x and then the old x contents are deleted. More precise rules can be derived from the examples in Table 1. \leftindent{\formatnote{.ne 5.5i} Replace "\sans{efgh}" in "\italic{\sans{xyz}}" \sans{ abcd efgh ijkl} with The substring / d\italic{xyz}ijkl} \sans{a / bc / defghijkl} becomes \sans{a / bc The substring / \italic{xyz}ijkl} \sans{a / bcd / efghijkl} becomes \sans{a / bcd The substring / \italic{xyz}ijkl} \sans{a / bcde / fghijkl} becomes \sans{a / bcd < The substring \sans{a / bcdefgh / ijkl} bcd\italic{xyz} / ijkl} becomes \sans{a / The substring \sans{a / bcdefghi / jkl} bcd\italic{xyz}i / jkl} becomes \sans{a / The substring / \italic{xyz}ijkl} \sans{abcd / efg / hijkl} becomes \sans{ abcd / < The substring \italic{xyz} / ijkl} \sans{abcd / efgh / ijkl} becomes \sans{abcd / The substring \italic{xyz}i / jkl} \sans{abcd / efghi / jkl} becomes \sans{ abcd / The substring / \italic{xyz}ijkl} \sans{abcde / fg / hijkl} becomes \sans{ abcd / < The substring \italic{xyz} / ijkl} \sans{abcde / fgh / ijkl} becomes \sans{abcd / The substring \italic{xyz}i / jkl} \sans{abcde / fghi / jkl} becomes \sans{abcd / > The substring \sans{abcdefgh / / ijkl } \sans{abcd\italic{xyz} / / ijkl} becomes > The substring \sans{abcdefgh / i / jkl} \sans{abcd\italic{xyz} / i / jkl} becomes Replace empty string between \sans{c} and \sans{d} in with "\italic{\sans{xyz}}" \sans{abcdef} < The substring \italic{xyz}def} \sans{a / bc / def} becomes \sans{a / bc / The substring bc\italic{xyz}d / ef} \sans{a / bcd / ef} becomes \sans{a / < The substring \italic{xyz}def} \sans{abc / / def} becomes \sans{abc / / > The substring \sans{abc / d / ef} \sans{abc\italic{xyz} / d / ef} becomes The substring \sans{abcd / e / f} \sans{abc\italic{xyz}d / e / f} becomes }\indent{\bold{Table 1. The effect of replace() on other subseqs.} The replace performed for each group of lines is shown above the group. Each line shows an example of some other subseq on the same base and its value after the replacement. The base string is letters only; the spaces are for readability and the slashes indicate the extent of the subseq values. In general the replacement is made by inserting the replacement at the finish of the replaced value and then deleting the replaced value. The <'s mark examples where the other subseq ends at the end of the replaced string and the >'s indicates examples where it begins there. } \section{5. An Example} The string processing sub-language is illustrated in the example below. String values are formally known as \italic{marker} values because they do not actually denote strings, but only refer to, or mark, substrings of underlying base strings. The marker variables in the example are \italic{letters}, \italic{text}, \italic{args}, and \italic{filename}. Concatentation, with \concat{~}, of marker values creates a brand new string value and returns a marker referring to its entirety. Marker assignments assign only the reference and do NOT create a new string that is a copy of the old. The functions token() and span() are built-in marker functions which begin scanning at the start of their first argument and look for a string meeting the pattern in the second argument. Span() scans its first argument and returns a marker for the maximal initial substring consisting of characters in the second argument. (The scan may extend beyond the end of the first argument.) Token() is similar, but begins by skipping initial characters of the first argument until finding one from the second argument. \begindata{bp,538539136} \enddata{bp,538539136} \view{bpv,538539136,8,0,0} \example{\smaller{-- wc.n -- Count the words in a text. -- A word is a contiguous sequence of letters. -- To use as a main program: -- nessrun -b /usr/andrew/lib/ness/wc.n <filename> -- the number of words in the file is printed. --- To call from a Ness function: -- wc_countwords( <a marker for the text> ) -- returns an integer value giving the number of words. marker letters -- a list of the letters that \ -- may occur in words := "abcdefghijklmnopqrstuvwxyz" ~ "ABCDEFGHIJKLMNOPQRSTUVWXYZ" -- countwords(text) counts the number of sequences \ -- of adjacent letters --- 'text' is a marker for a substring of the full text -- token(x, m) searches forward from the beginning of -- x through the rest of text for the first sequence \ -- of characters all of which are in m --- Any Ness program can call wc_countwords(). -\leftindent{integer} function wc_countwords(text) \leftindent{integer count marker t count := 0 -- no t := token(text,letters) words so far -- find first word \leftindent{\leftindent{\leftindent{\leftindent{-- check to see if the token found -- starts after the end of the text}}}} while t /= "" and extent(t, text) /= "" do \leftindent{\ count := count + 1 -- count this word t := -- find next word -- start search at next(t), \ -- the first character after \ -- the preceding word token(finish(t), letters) -- if no word was found, token() -- returns an empty string} end while return count} end function \ \begindata{bp,538539200} \enddata{bp,538539200} \view{bpv,538539200,9,0,0} -- the main program initializes the global variable, -- reads a file, counts the number of words in it, -- and prints a line -- span(x, m) finds the longest initial substring of -- x composed of characters from m -- ~ indicates concatenation of string values -function main(args)\leftindent{ marker filename -- extract the file name from argument list -- find the initial substring of args which is \ -- composed of letters, digits, dots, and slashes. filename := span(start(args), letters ~ "./0123456789") -- read file, count words, and print result printline ("The text of " ~ filename ~ " has "\leftindent{ ~ textimage(wc_countwords(readfile(filename))) \ ~ " words")}} end function -- Select some text. Type ESC ESC, and respond to \ -- the "Ness:" prompt with -- wc_showcount() --- The following function will be called and will show \ -- the number of words in the current selection -function wc_showcount() \leftindent{\ TellUser(textimage(wc_countwords(\leftindent{\leftindent{ currentselection(currentinset))))}}} end function -- To use the following entry point, \ -- put these lines in your .atkinit: -- load ness -- call ness-load /usr/andrew/lib/ness/wc.n -- Then choosing the menu entry "Count words" will show \ -- a count of the words in the current selection -extend "view:textview"\leftindent{ on menu "Search/Spell,Count words~60"\leftindent{ wc_showcount()} end menu} end extend}} \begindata{bp,538539264} \enddata{bp,538539264} \view{bpv,538539264,10,0,0} Execution of this function for two examples went like this: \leftindent{ % \bold{nessrun wc.n wc.n} Compile okay. Elapsed time is 2903 msec. The text of wc.n has 407 words Execution okay. Elapsed time is 1532 msec. % \bold{nessrun wc.n ness.c} Compile okay. Elapsed time is 1522 msec. The text of ness.c has 4617 words Execution okay. Elapsed time is 12284 msec.} \bold{Appendix: /usr/andrew/lib/ness/demos} These demonstration files each contain an explanation of how to use them. In general, they should be copied to /tmp before editing them with ez; they must be read-write in order to display their effects. (And if they are editted directly in /usr/andrew/lib/ness/demos, they leave .CKP files around charged against your account.) \leftindent{\description{happybday.d - Click on the cake for birthday greetings. birth.db - A data base. sorting. Click on a reference in a CF field. Try capstest.d xmas.d - - Two different approaches to scanning text. Music! Pictures! bank.d - A prototypical parametric letter. calc.d - Yet another calculator. The most truly programmable yet. volume.d - Very simple illustration of something the NextStep application builder can't do. }} \begindata{bp,538539328} \enddata{bp,538539328} \view{bpv,538539328,11,0,0} Copyright 1992 Carnegie Mellon University and IBM. All rights reserved. \smaller{\smaller{$Disclaimer: Permission to use, copy, modify, and distribute this software and its documentation for any purpose is hereby granted without fee, provided that the above copyright notice appear in all copies and that both that copyright notice, this permission notice, and the following disclaimer appear in supporting documentation, and that the names of IBM, Carnegie Mellon University, and other copyright holders, not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. IBM, CARNEGIE MELLON UNIVERSITY, AND THE OTHER COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL IBM, CARNEGIE MELLON UNIVERSITY, OR ANY OTHER COPYRIGHT HOLDER BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. $ }}\enddata{text,538567440}