Introduction to Readers

advertisement
Introduction to Readers
Racket is not just a programming language, it is a base using which you can create your
own language or environment. This language will use the syntax you specify to do the
operations you want. Basically, you are the ultimate designer of the language
environment you are setting up. If you want a program in your language to only
constitute of ‘dot’ and ‘dash’ as in morse code, you can do that. Of course, you first need
to make sense of what the combinations of ‘dots’ and ‘dashes’ mean. To achieve this, a
racket program is first parsed through a reader. If you include “#lang racket,” default
reader is used. However, we can specify our own reader which can be extended through
#reader form. As a demonstration, let us write a program that reads an arithmetic
expression of the form “3+6/5*9” and evaluates the expression.
STEP 1: Open a new racket file and mention the language you want to use. We want to
use racket.
STEP 2: We need to define two functions, read and read-syntax.
two functions and then specify them.
First, provide these
(provide read read-syntax)
(define (read in)
(syntax->datum (read-syntax #f in)))
(define (read-syntax src in)
(skip-whitespace in)
(read-arith src in))
Usually, read is meant to parse through data, whereas read-syntax is used to parse the
programs. That is, read-syntax usually returns a syntax object that is data obtained from
read function is encapsulated with information about its source etc.
STEP 3: Let us explore a new function regexp-match. This function matches a regular
expression with a string. The syntax for instructions to write regular expressions can be
found on
http://docs.racket-lang.org/reference/regexp.html
(define (skip-whitespace in)
(regexp-match #px"^\\s*" in))
In the above function, #px”^\\s*” pattern is matched in the input in. Here, #px stands for
preg expressions. There is slight difference between preg (#px) and reg (#rx) expression
in their syntax. preg expressions are compatible with perl.
Following #px, is the expression itself in quotes. ^ denotes beginning of input, or point
after a new line. \\s signifies the space, and * signifies 0 or more terms that matches the
space. Thus, the first sequences of spaces are read using skip-whitespace function.
STEP 4: Next step is figuring out how to get the location information of object read.
One command that helps you do this is port-next-location. port-next-location returns #f
or integer for line number, column number and position of the next character to be read.
(define-values (line col pos) (port-next-location in))
STEP 5: Let us define a function that will check the sanity of the syntax entered by the
user.
(define expr-match
(regexp-match
#px"^([a-z]|[0-9]+)(?:[-+*/]([a-z]|[0-9]+))*(?![-+*/])"
in))
Let’s see what is going on here. The input in is matched with the expression #px"^([az]|[0-9]+)(?:[-+*/]([a-z]|[0-9]+))*(?![-+*/])". The expression suggests that the first part
needs to be start of the expression, followed by either an alphabet or number. The +
symbol indicates that the alphabets or the numeric character may be present one or more
times. This sequence is followed by an operator, either +, -, * or /, followed by
additional alpha-numeric characters repeated one or more times. The * denotes, this
pattern of operator followed by alpha-numeric characters may be followed one or more
times. At the end, the last character should not be an operator. Basically, ? matches a
character 0 or 1 times, ! means not in the given set.
The parenthesis, ( ) require the matching of substrings as well, which are also returned as
following items of the list. If the pattern is not matched, #f is returned. Go ahead and
play with this function. Pass “3+4*5” in expr-match and see what is returned. Pass a few
more strings consisting of the arithmetic expressions. Note the terms returned. Do these
values match with your intuition.
STEP 5: datum->syntax is an important function when constructing a reader. It converts
the datum v to a syntax object, and wraps the location information around it. The
location information should be provided in form of a list or a vector, consisting of
(list source-name line column position span).
(define (to-syntax v delta span-str)
(datum->syntax #f v (make-srcloc delta span-str)))
(define (make-srcloc delta span-str)
(and line
(vector src line (+ col delta) (+ pos delta)
(string-length span-str))))
STEP 6: Now, let us try to actually make sense of everything. First lets look at the
function below, and try to see what you can understand on your own.
(define (parse-expr s delta)
(match (or (regexp-match #rx"^(.*?)([+-])(.*)$" s)
(regexp-match #rx"^(.*?)([*/])(.*)$" s))
[(list _ a-str op-str b-str)
(define a-len (string-length a-str))
(define a (parse-expr a-str delta))
(define b (parse-expr b-str (+ delta 1 a-len)))
(define op (to-syntax (string->symbol op-str)
(+ delta a-len) op-str))
(to-syntax (list op a b) delta s)]
[else (to-syntax (or (string->number s)
(string->symbol s))
delta s)]))
Let me start by giving you some hints. The first part is a function called match. You can
read more about match on http://docs.racket-lang.org/reference/match.html.
match basically reads a list or an input, and returns a matching sequence as output. For
example, (match ‘(1 2 3) [(list a b c) 4] [(list a b) 2) will return a 4, as a 3 value list is
matched with 4. The same expression, (match ‘(1 2) [(list a b c) 4] [(list a b) 2) will
return a 2.
Now, lets look at the use of match in our function. First, an or function will only evaluate
the first function. If the first function is false, then it will move onto the next expression,
otherwise return the first expression. I have already explained reg-exp-match in detail.
The first expression basically extracts the products or divisions together, and separates
the plus and minus symbols. The next regexp-match extracts the sum and difference
together. If you use these expressions independently, you will get more feel.
The last three terms of the list returned by regexp-match are matched with (to-syntax (list
op a b) delta s). If the return is not a list of more than three terms, then the value returned
is (to-syntax (or (string->number s)(string->symbol s)) delta s). You will have to look at
this function in a little more detail to see how the expression is parsed recursively and
converted to racket syntax.
STEP 7: Finally, lets try to combine everything together. Let us wrap Steps 4 through 6
in another function called read-arith
Thus,
(define (read-arith src in)
(Write steps 4 to 6)
(unless expr-match
(raise-read-error "bad arithmetic syntax"
src line col pos
(and pos (- (file-position in) pos))))
(parse-expr (bytes->string/utf-8 (car expr-match)) 0))
Here, we will check if the expression is a valid arithmetic expression or not. Then, parse
the expression.
STEP 8: Check to see if your reader works. Save the above file as “myreader.rkt”. In
the interactive window, type #reader”myreader.rkt” 4*3+9, and execute.
Note: For your projects, a number of you have asked how to use the normal symbols +, etc to add and subtract quaternions. You can use the (rename-out functions to change the
name of the functions you have defined. For example, see the code below.
Name a file qsys.rkt and write the following code in it.
(module <ignored> racket
(provide (except-out (all-from-out racket) + *)
(rename-out (+q +)(*q *) ) )
(define(+q x y) (+ x y 1000))
(define(*q x y) (* x y 1000)) )
Notice that we are providing all the functions from racket except + and *, and providing
our + and * by renaming the existing function as + and *.
In a different file, you can write,
#lang racket
(require "qsystem.rkt")
(+ 2 3)
(* 2 3)
and run the file.
Download