Bracmat

advertisement

Bracmat

A guided tour

Bart Jongejan

2013

The name

Applications

Core Methods

Why Bracmat?

Code examples

Documentation

Development

Download

Finale

Bracmat

(brachiat. – w. branches)

1741

Country on planet Nazar, inhabited by juniper trees with good facilities for astronomy, transcendental philosophy and mining.

Niels Klim, by Ludvig Holberg (1684-1754).

2013

Software for analysis and transformation of uncharted and complex data.

Examples of Applications

HTML cleaning validation of text corpora extraction of tabular data from text semantic analysis of text automatic workflow creation computer algebra investigation of email chains

HTML cleaning

ensure standard header and footer check links add closing tags warn if element not allowed in context remove or translate disallowed attributes translate deprecated elements (font, center) remove redundant elements (small big)

Validation of text corpora

Dutch corpora (the Netherlands/Flanders):

CGN(2006), MWE (2007), D-COI (2008),

DPC (2010), Lassi (2011), SoNaR (2012)

XML wellformedness, tag usage, sampling, visualisation for manual tasks, statistics, tabular parts of reports.

Data extraction from text

Semantic analysis

"Skal jeg tisse mere af diabetes?"

(“Do I have to urinate more because of diabetes?”)

First: tokenizer, tagger (opennlp), parser

(mate-tools)

Then: using patterns, find relation and concepts in parse tree. Result:

Polyuri DUE TO diabetes mellitus

Automatic Workflow Creation

Computer algebra

Face tracker: frame-by-frame video analysis

Head gestures: velocity, acceleration y: pixel position ∝ muscle force x: frame # a: head position c: head acceleration b: head velocity

Computer algebra

Solve three equations → acceleration

( c

. ( -1*St2^3

+ 2*St*St2*St3

+ St2*St4*period

+ -1*St^2*St4

+ -1*St3^2*period

)

^ -1

* ( -1*Sh*St2^2

+ Sh*St*St3

+ St*St2*Sth

+ St2*St2h*period

+ -1*St3*Sth*period

+ -1*St^2*St2h

)

)

Bracmat solution

Java code return

( Sh*(St*St3 - St2*St2)

+ Sth*(St*St2 - St3*period)

+ St2h*(St2*period - St*St)

)

/

( St2*(2*St*St3 - St2*St2 + St4*period)

- St*St*St4

- St3*St3*period

);

Email chains

Thu, 11 Oct 2012 16:28:59 +0200 (CEST)

From: bartj@hum.ku.dk

, 2012 10 11 14 28 59 200

MIME-Version: 1.0

Subject: Bracmat, GitHub

CC: Bart.jongejan@gmail.com

( Bracmat

, "Bart Jongejan"

X-pmrqc: 1

Priority: normal

. "Re: Bracmat"

--Alt-Boundary-3336.14371857

, (

. "Re: Bracmat"

Core methods

composition normalization pattern matching procedural logic

Composition

Compose complex expressions from simpler ones. binary operator complex expression expression another expression

Normalization

Automatically derive canonical expressions from unnormalized ones.

Pattern matching

Deconstruct complex expressions into simpler ones using pattern matching.

complex expression

?

pattern

?

simple expressions

Procedural logic

( complex expression

?

pattern

?

)

& 

| 

WHY?

How does a test particle move, given a set of basis vectors and a specific metric?

→ symbolic algebra

Symbolic manipulations easy, but MANY.

Pen and paper: doubts about correctness.

Computer: no errors.

1986: First version of Bracmat composes and normalises algebraic expressions.

1988: Pattern matching and procedural logic

All Bracmat expressions are binary trees:

+ x

^

2 a

* y

^

3 x ^2 + ( a * y ^3

Code examples

prompt answer follows

{?} 1+2

{!} 3*a keyboard input

{!}

{?}

3 a+a+a answer concise

“ a ” is a symbol, not a variable

{?} b+a

{!} a+b non standard order canonical order

3 , 3*a and a+b are canonical forms of

1+2 , a+a+a and b+a , respectively.

Operators (initially):

* multiplication

+ addition

^ exponentiation

\L

\D taking a logarithm taking a derivative

NO operators for subtraction and division

: a - b = a+(-1*b) a / b = a*(b^-1)

Bracmat expressions autonomously seek toward stable states.

Comparison: garbage falling on dump.

Small things slide down through the voids.

Chemicals interact.

Fumes disappear.

Finally all is quiet.

This is the

“Normal state”.

Landfill expression: landfill=ashtray+5*bag+barbie+

12*bottle+9*cork+stone+television

Truck’s contents: truck=apple+3*bag+paper+phone

Emptying the truck in the landfill:

(!landfill + !truck) : ?landfill

Landfill’s new stable state after a while: apple+ashtray+8*bag+barbie+12*bottle

+9*cork+paper+phone+stone+television

Landfill: not nice, but unwieldy & repulsive.

Good News: there are gems in the landfill.

If Hengki wants to obtain gems, he needs to: recognise valuable items and pick up those valuable items

Jonathan McIntosh, 2004

Hengki’s program most

!landfill: ?junk

of it doll pattern if you see doll, take it scan the landfill

+ ?n*((ken|barbie):?gem)

+ ?morejunk

after doll seen, go on with next step

& !HengkiStuff+!gem:?HengkiStuff

add doll to H.’s possessions and don’t return it to the landfill

|

& !junk+!morejunk+(!n+-1)*!gem

: ?landfill

if no doll seen, landfill and Hengki’s possessions remain unchanged

Four new binary operators:

=

:

&

| bind rhs to symbol on lhs match lhs (subject) with rhs (pattern) do rhs if lhs succeeds do rhs if lhs fails and two prefixes:

?

capture a value and bind it to the adjacent symbol.

!

produce the value that is bound to the adjacent symbol

= : & | and \D evaluated away (normally).

Dynamic forces that shake and break rubble.

, and .

do always persist through evaluation.

Residual forces that keep things in place.

Whitespace + * ^ and \L can persist, e.g.: y x → y x y+x → x+y y*x → x*y

But:

"" a, 0+a, 1*a → a, a, a

Examples of data structures that don’t change when (re)evaluated.

x^2,y^2,100

(.1 0 0)

(.0 0 -1)

3 algebraic expressions separated by commas

(.0 1 0)

BUT:

(1 0 0)

(0 0 -1)

9 numbers in a matrix

Lists built with whitespace, + and *

are always flattened!

(0 1 0) → 1 0 0 0 0 -1 0 1 0

Because blank, comma and dot are binary operators, this sentence is a perfect Bracmat expression.

{?} Because blank, comma and dot are binary operators, the sentence you are reading is a perfect Bracmat expression.

{!} Because blank

, comma and dot are binary operators

, the sentence you are reading is a perfect

Bracmat expression

.

Logical expansion of application domain of

Bracmat as:

“Software for analysis and transformation of uncharted and complex data.” textual

Example:

Check sentence syntax with Bracmat patterns:

(S=!NP !VP)

& (NP=!DET !N) non-terminals

& (VP=!V|!V !NP)

& (DET=a|the)

& (N=woman|man) terminals rule application

& (V=shoots|kisses)

& ( a man kisses the woman:!S

& put$"That's grammatical!\n"

| put$"not grammatical\n"

) screen output if failure screen output if success

Operator $ applies function to argument.

Only few built-in functions, e.g.: get get input from file, keyboard or string put write a result to file, screen or string lst str serialize a variable to file, screen or string concatenate a tree into a single string

Function application: str$(I m p l o d e) → Implode

Define your own functions.

E.g. syntax checker: check=

S NP VP DET N V

= only evaluates lhs.

& (NP=!DET !N)

& (VP=!V|!V !NP)

& (DET=a|the)

& (N=woman|man) before dot: declaration of local variables after dot: function body

& (V=shoots|kisses)

& !arg:!S

'check' succeeds if match ok

Call check with a sentence as argument:

{?} check$(a woman shoots)&okay|no

{!} okay

{?} check$(a man a man shoots)&T|F

{!} F

( ROOT

.

(VERB.Skal.skal)

(subj.PRON.jeg.jeg)

( vobj

. (VERB.tisse.tisse)

( dobj

. (ADJ.mere.mere)

( pobj

. (ADP.af.af) concept 1

(nobj.NOUN.diabetes.diabetes)

)

)

)

(pnct.X."?"."?") concept 2

)

[…]

| (its.hasTree)

Relation

$ ( !arg

Pattern

. (

Whatever matches this …

… must also match this.

( vobj

. ((VERB.?.?):?a)

( dobj

. ?b (pobj.(ADP.af.af) ?LC2)

)

)

)

)

Why “ !a

” ?

& !a (dobj.!b):?LC1

)

& "DUE TO" relation (‘attribute’)

Concept 1 pattern

(its.hasTree)

$ ( !LC1

. (

= (VERB.?.tisse)

(dobj.(ADJ.?.mere) ?)

)

) concept 1

→ (28442001.Polyuri)

Concept 2 pattern

(its.hasLemma)

$ ( !LC2

.(= sukkersyge ?|diabetes ?

)) concept 2

→ (73211009."diabetes mellitus ")

(attribute."DUE TO")

( concept1

. "Clinical Finding"

. 28442001.Polyuri

)

( concept2

. "Clinical Finding"

. 73211009."diabetes mellitus "

)

"Is sinning sincere?":?Mytext

& 0:?Bi

Initialise bigram accumulator subject

: ?

( %?One %?Two ?

matching pattern

& (!One !Two)+!Bi:?Bi

& ~

) accumulate

)

| lst$Bi embedded instructions fail! (backtrack)

(Bi=

(" " i)

+ 2*(" " s)

+ (I s)

+ (c e)

+ (e "?")

+ (e r)

+ 3*(i n)

+ 2*(n " ")

+ (n c)

+ (r e)

+ (s " ")

+ 2*(s i));

0:?Bi

& "из фрагментов текстов":?Mytext

& @( !Mytext

: ?

( (%?One & utf$!One)

(%?Two & utf$!Two)

?

& (!One !Two)+!Bi:?Bi

& ~

)

)

| lst$Bi

Bi=

(" " т)

+ (" " ф)

+ (а г)

+ (в " ")

+ (г м)

+ (е к)

+ (е н)

+ (з " ")

+ (и з)

+ (к с)

+ (м е)

+ (н т)

+ 2*(о в)

+ (р а)

+ (с т)

+ (т е)

+ 2*(т о)

+ (ф р);

Parse 0

n

1

n

Example of recursive pattern.

{?} AB= ( ""

| 0 !AB 1

) left hand side of

| is ”nothing”.

So this matches zero 0's and 1's.

recurse

{?} 0 0 1 1:!AB & good | bad

{!} good

Parse 0

n

1

n

2

n

{?} AB= ( "":?C

| 0 !AB 1 if zero 0 's and 1 's then also zero 2 's

& 2 !C:?C

) for each nested pair of

0 and 1 , add a 2 to C

{?} ABC=!AB !C

after parsing n 0 's and n

1 's, C contains n 2 's

{?} 0 0 1 1 2 2:!ABC & good | bad

{!} good

Documentation

http://jongejan.dk/bart/bracmat.html

Most complete documentation.

http://rosettacode.org/wiki/Category:Bracmat

Over 170 examples that can be compared with implementations in other programming languages.

Development

Evolution at moderate pace.

Great variety of snippets in valid.bra: guards against unexpected and unwanted behavioural changes and tests all C-code.

Behaviour described in file help (precursor to bracmat.html).

Changes are logged.

Download

Open source since 3 June 2003 (GPL).

http://cst.dk/download/bracmat/

Source code spanning period 1986-2012.

https://github.com/BartJongejan/Bracmat

Always the latest source code.

Finale

garbage collection most programming languages gem collection

Bracmat

“ S taten bruger dem imidlertid til at undersøge

Metalgruberne; thi ligesaa slet som de see hvad der er oven paa Jorden, saa fortreffeligen see de det der er inden i den.”

Finale

In Bracmat, trees are first class citizens.

Trees have autonomous behaviour.

Using pattern matching, the State controls trees.

The State itself consists of trees.

Finale

Ease of use, clarity and expressive power of

Bracmat’s patterns can compete with RE, SQL,

XQuery and Prolog.

Pattern matching as a primitive is strangely absent in popular programming languages, having died out with Snobol (1962 ~1990).

Modelling data in Bracmat is a small step away from understanding and controlling it.

Contents Part 2

Bracmat expressions

Expression evaluation

Bracmat in use

Bracmat expressions

Code and Data

Numbers

Strings

Lists

Structures

Booleans

Functions

Special symbols

Arrays

Objects

Hash Tables

Code and Data

No sharp distinction:

"I’m stable"

997^1/2

→ "I’m stable"

→ 997^1/2

(not so) stable → not so stable

998^1/2 → 2^1/2*499^1/2 i*i:>0&pos|"not pos" → "not pos"

Numbers

Arbitrary-precision arithmetic.

Rational numbers.

No floating point!

2/3 + -1/6 → 1/2

1/99+-1/100+1/101 → 10001/999900

2^216091+-1 → 746093103…815528447

(65050 digits, 31 st Mersenne prime)

Strings

Strings cannot contain null-bytes, otherwise no restrictions.

A

"A string can extend over multiple lines"

"2^216091+-1"

Lists

Sums, products and sentences.

x + 4 + a + 8 p * q * z

This list has five elements

Lists inside lists: a + 5 * e ^ (i * pi + x) + b

Lists (continued)

a + b : ?x + b + ?y

a * b : ?x * b * ?y

a b : ?x b ?y

{ y := 0 }

{ y := 1 }

{ y := ""}

0 , 1 and the empty string "" are identity

(or neutral) elements in sums, products and sentences respectively.

Structures

(uni . "Københavns Universitet")

(institut . CST)

(publications.(2011. pub1 pub2)

(2012. pub3 pub4)

(2013. pub5 pub6 pub7

)

)

Booleans

There is no separate boolean type.

Each node in a Bracmat expression has a success/failure status flag.

a success

~ failure

1+2 successful evaluation to 3 a:b failing match operation

Functions

(=a b. !arg:(?a.?b) & (!b.!a)) parameter local variables

$ (jeg.går) returned value argument function application result

→ går.jeg

Special symbols

e Euler’s constant, the basis of the natural logarithm, 2.7182… x \D (e ^ x) → e ^ x i unit imaginary number i * i → -1 pi the ratio of a circle's circumference to its diameter e ^ (i * pi) → -1

Arrays

An array “ A ” is not a variable, it is the stack of all variables named “ A ”.

tbl$(A,100) create array A size 100

117:?(41 $ A) assign to element A[41]

!(41$A) inspect element A[41] tbl$(A,0) delete array A

Objects

Objects are the only Bracmat expressions that can change.

(language=(iso639=) (spkrs=))

& new$language:?Danish

& da:?(Danish..iso639)

& 6M:?(Danish..spkrs)

Hash Tables

built-in object type effectively store&search key-value pairs.

new$hash:?H

& (H..insert)$(Danmark.!Danish)

& (H..find)$Danmark

→ Danmark.(=(iso639=da) (spkrs=6M))

Expression Evaluation

Definition

Program flow

Pattern matching

Macro evaluation

λ – calculus

Normalization

Definition

Right hand side of = operator is not evaluated.

pattern definition

J = ? ?#x ?; function definition double=.!arg+!arg; double$7

(=.!arg+!arg)$7 anonymous function

→ 14

→ 14

Program flow

Evaluation from left to right, depth first.

firstthis & thenthis ifnotthis | thenthis whl'( body ) fun $ arg (or fun ' arg )

!

subroutine

Pattern matching

subject : pattern

Pattern matching evaluator

Normal evaluator

Pattern doesn’t evolve, side effects possible.

Primary result: subject/success or failure.

& operator: escape to normal evaluation.

@ prefix indicates string pattern matching.

string subject

@( kabbatus

: ?

(?%x & rev$!x:?y)

!y

?

) escape to normal evaluation string pattern matching side effect: assignment embedded pattern matching operation normally evaluated

No regular expressions. Instead, use or

@( string : pattern )

( tree : pattern ) nesting & recursion regex pattern no

/yes yes named variables non-string subject greedy no no

/yes yes yes yes/no no

Regex:

DO {stay in string world}

UNTIL (regex clear as mud)

THEN use other tool

Bracmat: A list, to begin with tokenize input string → tree

XPath

XQuery

SQL

LINQ

CQL

WHILE(more and more interesting)

{ pattern match tree

& make more interesting tree

}

Macro evaluation

Replace variable by the value of the variable.

X=6;i=0;mltpls=; Lhs: empty string

' ( whl

' ( !i+1:<10:?i

& !mltpls ""$X*!i:?mltpls

)

Lhs: empty string

)

: ?code;

Macro after evaluation. lst$code;

(code=

= whl

' ( !i+1:<10:?i

' replaced by =

& !mltpls 6 *!i:?mltpls

)

);

""$X replaced by 6

Macros

(1) speed up performance tight loops pattern matching over long lists,

(2) incrementally build the

”Greatest Common Pattern” that matches a given set of data structures. (Tree kernel)

(3) create or change code at run time

λ - calculus

The lambda abstraction

(λx.x)y translates to

/('(x.$x))$y

For factorial and Fibonacci the hard way, see http://rosettacode.org/wiki/Y_combinator#Bracmat

Normalization

flatten identity sort combine

.

,

WS + * ^ \L a a a a

"" 0 1 a a a a a a

Normalization: flatten

(a,b,c),(d,e),f,g → a,b,c,d,e,f,g

, ,

, , → a , a , , , b , b c d e f g c ,

Because of associativity d , e , f g

Normalization: identity

"" a b c "" "" d "" → a b c d

0+a+b+c+0+0+d+0 → a+b+c+d

1*a*b*c*1*1*d*1 → a*b*c*d

Because "" 0 and 1 are neutral elements

Normalization: sort

f+a+b+g+c+e+d → a+b+c+d+e+f+g f*a*b*g*c*e*d → a*b*c*d*e*f*g e*a*b*c*d*f*g

3*b*c+a*b+a*e^(k+j)*i

↓ a*b+3*b*c+i*e^(j+k)*a

Because + and * are commutative

Normalization: combine

eat+the+the+zuppe → eat+2*the+zuppe

2*a*b+(a+-1*b)^2 → a^2+b^2

Because multiplication is distributive over addition

Bracmat in use

Surface Appearance

Create a new program

Canonical lay-out edit-save-reformat-reopen loop

Requirements

Build

Run code

Memory

Unicode support

SGML, XML, HTML support

Surface Appearance

white space!

Operators =.,|&: +*^\L\D'$_

Prefixes [~/#<>%@`?!

and !!

. Also -

Parentheses and quotes override default.

Comments: { yes, not stored in memory! }

Layout: free, convert to canonical with lst

Create a new program

(1) From scratch. Easy: no boiler plate.

(2) Wizzard project.bra creates skeleton: description, program stub, facility to help you keep your code nicely formatted.

Canonical lay-out

Principles:

- reasonable margins

- indented

- pair of parentheses aligned

(horizont|vertic)ally

- heading operator aligned w. parentheses

Some coding errors become conspicuous.

edit-save-reformat-reopen loop

Use combination of terminal window or

DOS-prompt and auto-reloading text editor.

Edit your code, save, type !r

in Bracmat,

(auto)reload program in editor.

{?} (diet =. !arg:? (fish|meat|chicken) ? & carnivore |

{1} !arg:? (yoghurt|milk|cheese) ? & veggie

{1} | !arg:? (fruit|nuts) ? & vegan | dontknow

{1} )

{!} diet

S 0,00 sec

{?} lst$diet

(diet=

. !arg:? (fish|meat|chicken) ?

& carnivore

| !arg

: ? (yoghurt|milk|cheese) ?

& veggie

| !arg:? (fruit|nuts) ?&vegan

| dontknow);

{!} diet

Requirements

C-compiler with standard libraries

32-bit or 64-bit OS bracmat.c and xml.c (< 20 000 lines in all) nothing more

Build

E.g. compile and link with GNU-C: gcc -std=c99 -pedantic -Wall -O3 static -DNDEBUG -o bracmat bracmat.c xml.c

Windows, dynamically linked, 32 bit: 100Kb

Linux, statically linked, 64 bit: 1 Mb

Run code

w/0 parameters: interactive (REPL) each parameter is evaluated, left to right.

Bracmat as embedded s/w:

JNI – call Bracmat from Java

Linked with C or C++ code, Bracmat can be called and can even call C-functions.

Memory

Nodes

Very compact, 4,8,12,16, … bytes per node

Reference counting

Structure sharing

Unreferenced nodes freed

Variables dynamically scoped (lex.bra → lexical)

Unicode

UTF8 assumed, fall back to ISO-8859-1

Ø  ø in Unicode, UTF-8: utf$low$chu$216 → 248

Ø  ø in ISO-8859-1: asc$low$chr$216 → 248 upp$ ру́ → РУ́

SGML, XML, HTML

Robust and fast parsing, done in C.

Transformation to Bracmat expression.

No nesting of elements. Separate step.

XML/HTML entities → UTF-8 get$(" <p class='Q' enabled

>Sams&oslash;<br/</p> ",MEM,HT,ML)

→ (p.(class.Q) (enabled.)) Samsø

(br.,) (.p.)

Download