Bart Jongejan
2013
The name
Applications
Core Methods
Why Bracmat?
Code examples
Documentation
Development
Download
Finale
(brachiat. – w. branches)
1741
Country on planet Nazar, inhabited by juniper trees with good facilities for astronomy, transcendental philosophy and mining.
Niels Klim, by Ludvig Holberg (1684-1754).
2013
Software for analysis and transformation of uncharted and complex data.
HTML cleaning validation of text corpora extraction of tabular data from text semantic analysis of text automatic workflow creation computer algebra investigation of email chains
ensure standard header and footer check links add closing tags warn if element not allowed in context remove or translate disallowed attributes translate deprecated elements (font, center) remove redundant elements (small big)
Dutch corpora (the Netherlands/Flanders):
CGN(2006), MWE (2007), D-COI (2008),
DPC (2010), Lassi (2011), SoNaR (2012)
XML wellformedness, tag usage, sampling, visualisation for manual tasks, statistics, tabular parts of reports.
"Skal jeg tisse mere af diabetes?"
(“Do I have to urinate more because of diabetes?”)
First: tokenizer, tagger (opennlp), parser
(mate-tools)
Then: using patterns, find relation and concepts in parse tree. Result:
Polyuri DUE TO diabetes mellitus
Face tracker: frame-by-frame video analysis
Head gestures: velocity, acceleration y: pixel position ∝ muscle force x: frame # a: head position c: head acceleration b: head velocity
Solve three equations → acceleration
( c
. ( -1*St2^3
+ 2*St*St2*St3
+ St2*St4*period
+ -1*St^2*St4
+ -1*St3^2*period
)
^ -1
* ( -1*Sh*St2^2
+ Sh*St*St3
+ St*St2*Sth
+ St2*St2h*period
+ -1*St3*Sth*period
+ -1*St^2*St2h
)
)
Bracmat solution
Java code return
( Sh*(St*St3 - St2*St2)
+ Sth*(St*St2 - St3*period)
+ St2h*(St2*period - St*St)
)
/
( St2*(2*St*St3 - St2*St2 + St4*period)
- St*St*St4
- St3*St3*period
);
Thu, 11 Oct 2012 16:28:59 +0200 (CEST)
From: bartj@hum.ku.dk
, 2012 10 11 14 28 59 200
MIME-Version: 1.0
Subject: Bracmat, GitHub
CC: Bart.jongejan@gmail.com
( Bracmat
, "Bart Jongejan"
X-pmrqc: 1
Priority: normal
. "Re: Bracmat"
--Alt-Boundary-3336.14371857
, (
. "Re: Bracmat"
composition normalization pattern matching procedural logic
Compose complex expressions from simpler ones. binary operator complex expression expression another expression
Automatically derive canonical expressions from unnormalized ones.
Deconstruct complex expressions into simpler ones using pattern matching.
complex expression
?
pattern
?
simple expressions
( complex expression
?
pattern
?
)
&
|
WHY?
How does a test particle move, given a set of basis vectors and a specific metric?
→ symbolic algebra
Symbolic manipulations easy, but MANY.
Pen and paper: doubts about correctness.
Computer: no errors.
1986: First version of Bracmat composes and normalises algebraic expressions.
1988: Pattern matching and procedural logic
All Bracmat expressions are binary trees:
+ x
^
2 a
* y
^
3 x ^2 + ( a * y ^3
prompt answer follows
{?} 1+2
{!} 3*a keyboard input
{!}
{?}
3 a+a+a answer concise
“ a ” is a symbol, not a variable
{?} b+a
{!} a+b non standard order canonical order
3 , 3*a and a+b are canonical forms of
1+2 , a+a+a and b+a , respectively.
Operators (initially):
* multiplication
+ addition
^ exponentiation
\L
\D taking a logarithm taking a derivative
NO operators for subtraction and division
: a - b = a+(-1*b) a / b = a*(b^-1)
Bracmat expressions autonomously seek toward stable states.
Comparison: garbage falling on dump.
Small things slide down through the voids.
Chemicals interact.
Fumes disappear.
Finally all is quiet.
This is the
“Normal state”.
Landfill expression: landfill=ashtray+5*bag+barbie+
12*bottle+9*cork+stone+television
Truck’s contents: truck=apple+3*bag+paper+phone
Emptying the truck in the landfill:
(!landfill + !truck) : ?landfill
Landfill’s new stable state after a while: apple+ashtray+8*bag+barbie+12*bottle
+9*cork+paper+phone+stone+television
Landfill: not nice, but unwieldy & repulsive.
Good News: there are gems in the landfill.
If Hengki wants to obtain gems, he needs to: recognise valuable items and pick up those valuable items
Jonathan McIntosh, 2004
Hengki’s program most
!landfill: ?junk
of it doll pattern if you see doll, take it scan the landfill
+ ?n*((ken|barbie):?gem)
+ ?morejunk
after doll seen, go on with next step
& !HengkiStuff+!gem:?HengkiStuff
add doll to H.’s possessions and don’t return it to the landfill
|
& !junk+!morejunk+(!n+-1)*!gem
: ?landfill
if no doll seen, landfill and Hengki’s possessions remain unchanged
Four new binary operators:
=
:
&
| bind rhs to symbol on lhs match lhs (subject) with rhs (pattern) do rhs if lhs succeeds do rhs if lhs fails and two prefixes:
?
capture a value and bind it to the adjacent symbol.
!
produce the value that is bound to the adjacent symbol
= : & | and \D evaluated away (normally).
Dynamic forces that shake and break rubble.
, and .
do always persist through evaluation.
Residual forces that keep things in place.
Whitespace + * ^ and \L can persist, e.g.: y x → y x y+x → x+y y*x → x*y
But:
"" a, 0+a, 1*a → a, a, a
Examples of data structures that don’t change when (re)evaluated.
x^2,y^2,100
(.1 0 0)
(.0 0 -1)
3 algebraic expressions separated by commas
(.0 1 0)
BUT:
(1 0 0)
(0 0 -1)
9 numbers in a matrix
Lists built with whitespace, + and *
are always flattened!
(0 1 0) → 1 0 0 0 0 -1 0 1 0
Because blank, comma and dot are binary operators, this sentence is a perfect Bracmat expression.
{?} Because blank, comma and dot are binary operators, the sentence you are reading is a perfect Bracmat expression.
{!} Because blank
, comma and dot are binary operators
, the sentence you are reading is a perfect
Bracmat expression
.
Logical expansion of application domain of
Bracmat as:
“Software for analysis and transformation of uncharted and complex data.” textual
Example:
Check sentence syntax with Bracmat patterns:
(S=!NP !VP)
& (NP=!DET !N) non-terminals
& (VP=!V|!V !NP)
& (DET=a|the)
& (N=woman|man) terminals rule application
& (V=shoots|kisses)
& ( a man kisses the woman:!S
& put$"That's grammatical!\n"
| put$"not grammatical\n"
) screen output if failure screen output if success
Operator $ applies function to argument.
Only few built-in functions, e.g.: get get input from file, keyboard or string put write a result to file, screen or string lst str serialize a variable to file, screen or string concatenate a tree into a single string
Function application: str$(I m p l o d e) → Implode
Define your own functions.
E.g. syntax checker: check=
S NP VP DET N V
= only evaluates lhs.
& (NP=!DET !N)
& (VP=!V|!V !NP)
& (DET=a|the)
& (N=woman|man) before dot: declaration of local variables after dot: function body
& (V=shoots|kisses)
& !arg:!S
'check' succeeds if match ok
Call check with a sentence as argument:
{?} check$(a woman shoots)&okay|no
{!} okay
{?} check$(a man a man shoots)&T|F
{!} F
( ROOT
.
(VERB.Skal.skal)
(subj.PRON.jeg.jeg)
( vobj
. (VERB.tisse.tisse)
( dobj
. (ADJ.mere.mere)
( pobj
. (ADP.af.af) concept 1
(nobj.NOUN.diabetes.diabetes)
)
)
)
(pnct.X."?"."?") concept 2
)
[…]
| (its.hasTree)
Relation
$ ( !arg
Pattern
. (
Whatever matches this …
… must also match this.
( vobj
. ((VERB.?.?):?a)
( dobj
. ?b (pobj.(ADP.af.af) ?LC2)
)
)
)
)
Why “ !a
” ?
& !a (dobj.!b):?LC1
)
& "DUE TO" relation (‘attribute’)
Concept 1 pattern
(its.hasTree)
$ ( !LC1
. (
= (VERB.?.tisse)
(dobj.(ADJ.?.mere) ?)
)
) concept 1
→ (28442001.Polyuri)
Concept 2 pattern
(its.hasLemma)
$ ( !LC2
.(= sukkersyge ?|diabetes ?
)) concept 2
→ (73211009."diabetes mellitus ")
(attribute."DUE TO")
( concept1
. "Clinical Finding"
. 28442001.Polyuri
)
( concept2
. "Clinical Finding"
. 73211009."diabetes mellitus "
)
"Is sinning sincere?":?Mytext
& 0:?Bi
Initialise bigram accumulator subject
: ?
( %?One %?Two ?
matching pattern
& (!One !Two)+!Bi:?Bi
& ~
) accumulate
)
| lst$Bi embedded instructions fail! (backtrack)
(Bi=
(" " i)
+ 2*(" " s)
+ (I s)
+ (c e)
+ (e "?")
+ (e r)
+ 3*(i n)
+ 2*(n " ")
+ (n c)
+ (r e)
+ (s " ")
+ 2*(s i));
0:?Bi
& "из фрагментов текстов":?Mytext
& @( !Mytext
: ?
( (%?One & utf$!One)
(%?Two & utf$!Two)
?
& (!One !Two)+!Bi:?Bi
& ~
)
)
| lst$Bi
Bi=
(" " т)
+ (" " ф)
+ (а г)
+ (в " ")
+ (г м)
+ (е к)
+ (е н)
+ (з " ")
+ (и з)
+ (к с)
+ (м е)
+ (н т)
+ 2*(о в)
+ (р а)
+ (с т)
+ (т е)
+ 2*(т о)
+ (ф р);
n
n
Example of recursive pattern.
{?} AB= ( ""
| 0 !AB 1
) left hand side of
| is ”nothing”.
So this matches zero 0's and 1's.
recurse
{?} 0 0 1 1:!AB & good | bad
{!} good
n
n
n
{?} AB= ( "":?C
| 0 !AB 1 if zero 0 's and 1 's then also zero 2 's
& 2 !C:?C
) for each nested pair of
0 and 1 , add a 2 to C
{?} ABC=!AB !C
after parsing n 0 's and n
1 's, C contains n 2 's
{?} 0 0 1 1 2 2:!ABC & good | bad
{!} good
http://jongejan.dk/bart/bracmat.html
Most complete documentation.
http://rosettacode.org/wiki/Category:Bracmat
Over 170 examples that can be compared with implementations in other programming languages.
Evolution at moderate pace.
Great variety of snippets in valid.bra: guards against unexpected and unwanted behavioural changes and tests all C-code.
Behaviour described in file help (precursor to bracmat.html).
Changes are logged.
Open source since 3 June 2003 (GPL).
http://cst.dk/download/bracmat/
Source code spanning period 1986-2012.
https://github.com/BartJongejan/Bracmat
Always the latest source code.
garbage collection most programming languages gem collection
Bracmat
“ S taten bruger dem imidlertid til at undersøge
Metalgruberne; thi ligesaa slet som de see hvad der er oven paa Jorden, saa fortreffeligen see de det der er inden i den.”
In Bracmat, trees are first class citizens.
Trees have autonomous behaviour.
Using pattern matching, the State controls trees.
The State itself consists of trees.
Ease of use, clarity and expressive power of
Bracmat’s patterns can compete with RE, SQL,
XQuery and Prolog.
Pattern matching as a primitive is strangely absent in popular programming languages, having died out with Snobol (1962 ~1990).
Modelling data in Bracmat is a small step away from understanding and controlling it.
Bracmat expressions
Expression evaluation
Bracmat in use
Code and Data
Numbers
Strings
Lists
Structures
Booleans
Functions
Special symbols
Arrays
Objects
Hash Tables
No sharp distinction:
"I’m stable"
997^1/2
→ "I’m stable"
→ 997^1/2
(not so) stable → not so stable
998^1/2 → 2^1/2*499^1/2 i*i:>0&pos|"not pos" → "not pos"
Arbitrary-precision arithmetic.
Rational numbers.
No floating point!
2/3 + -1/6 → 1/2
1/99+-1/100+1/101 → 10001/999900
2^216091+-1 → 746093103…815528447
(65050 digits, 31 st Mersenne prime)
Strings cannot contain null-bytes, otherwise no restrictions.
A
"A string can extend over multiple lines"
"2^216091+-1"
Sums, products and sentences.
x + 4 + a + 8 p * q * z
This list has five elements
Lists inside lists: a + 5 * e ^ (i * pi + x) + b
a + b : ?x + b + ?y
a * b : ?x * b * ?y
a b : ?x b ?y
{ y := 0 }
{ y := 1 }
{ y := ""}
0 , 1 and the empty string "" are identity
(or neutral) elements in sums, products and sentences respectively.
(uni . "Københavns Universitet")
(institut . CST)
(publications.(2011. pub1 pub2)
(2012. pub3 pub4)
(2013. pub5 pub6 pub7
)
)
There is no separate boolean type.
Each node in a Bracmat expression has a success/failure status flag.
a success
~ failure
1+2 successful evaluation to 3 a:b failing match operation
(=a b. !arg:(?a.?b) & (!b.!a)) parameter local variables
$ (jeg.går) returned value argument function application result
→ går.jeg
e Euler’s constant, the basis of the natural logarithm, 2.7182… x \D (e ^ x) → e ^ x i unit imaginary number i * i → -1 pi the ratio of a circle's circumference to its diameter e ^ (i * pi) → -1
An array “ A ” is not a variable, it is the stack of all variables named “ A ”.
tbl$(A,100) create array A size 100
117:?(41 $ A) assign to element A[41]
!(41$A) inspect element A[41] tbl$(A,0) delete array A
Objects are the only Bracmat expressions that can change.
(language=(iso639=) (spkrs=))
& new$language:?Danish
& da:?(Danish..iso639)
& 6M:?(Danish..spkrs)
built-in object type effectively store&search key-value pairs.
new$hash:?H
& (H..insert)$(Danmark.!Danish)
& (H..find)$Danmark
→ Danmark.(=(iso639=da) (spkrs=6M))
Definition
Program flow
Pattern matching
Macro evaluation
λ – calculus
Normalization
Right hand side of = operator is not evaluated.
pattern definition
J = ? ?#x ?; function definition double=.!arg+!arg; double$7
(=.!arg+!arg)$7 anonymous function
→ 14
→ 14
Evaluation from left to right, depth first.
firstthis & thenthis ifnotthis | thenthis whl'( body ) fun $ arg (or fun ' arg )
!
subroutine
subject : pattern
Pattern matching evaluator
Normal evaluator
Pattern doesn’t evolve, side effects possible.
Primary result: subject/success or failure.
& operator: escape to normal evaluation.
@ prefix indicates string pattern matching.
string subject
@( kabbatus
: ?
(?%x & rev$!x:?y)
!y
?
) escape to normal evaluation string pattern matching side effect: assignment embedded pattern matching operation normally evaluated
No regular expressions. Instead, use or
@( string : pattern )
( tree : pattern ) nesting & recursion regex pattern no
/yes yes named variables non-string subject greedy no no
/yes yes yes yes/no no
Regex:
DO {stay in string world}
UNTIL (regex clear as mud)
THEN use other tool
Bracmat: A list, to begin with tokenize input string → tree
XPath
XQuery
SQL
LINQ
CQL
WHILE(more and more interesting)
{ pattern match tree
& make more interesting tree
}
Replace variable by the value of the variable.
X=6;i=0;mltpls=; Lhs: empty string
' ( whl
' ( !i+1:<10:?i
& !mltpls ""$X*!i:?mltpls
)
Lhs: empty string
)
: ?code;
Macro after evaluation. lst$code;
(code=
= whl
' ( !i+1:<10:?i
' replaced by =
& !mltpls 6 *!i:?mltpls
)
);
""$X replaced by 6
(1) speed up performance tight loops pattern matching over long lists,
(2) incrementally build the
”Greatest Common Pattern” that matches a given set of data structures. (Tree kernel)
(3) create or change code at run time
The lambda abstraction
(λx.x)y translates to
/('(x.$x))$y
For factorial and Fibonacci the hard way, see http://rosettacode.org/wiki/Y_combinator#Bracmat
flatten identity sort combine
.
,
WS + * ^ \L a a a a
"" 0 1 a a a a a a
(a,b,c),(d,e),f,g → a,b,c,d,e,f,g
, ,
, , → a , a , , , b , b c d e f g c ,
Because of associativity d , e , f g
"" a b c "" "" d "" → a b c d
0+a+b+c+0+0+d+0 → a+b+c+d
1*a*b*c*1*1*d*1 → a*b*c*d
Because "" 0 and 1 are neutral elements
f+a+b+g+c+e+d → a+b+c+d+e+f+g f*a*b*g*c*e*d → a*b*c*d*e*f*g e*a*b*c*d*f*g
3*b*c+a*b+a*e^(k+j)*i
↓ a*b+3*b*c+i*e^(j+k)*a
Because + and * are commutative
eat+the+the+zuppe → eat+2*the+zuppe
2*a*b+(a+-1*b)^2 → a^2+b^2
Because multiplication is distributive over addition
Surface Appearance
Create a new program
Canonical lay-out edit-save-reformat-reopen loop
Requirements
Build
Run code
Memory
Unicode support
SGML, XML, HTML support
white space!
Operators =.,|&: +*^\L\D'$_
Prefixes [~/#<>%@`?!
and !!
. Also -
Parentheses and quotes override default.
Comments: { yes, not stored in memory! }
Layout: free, convert to canonical with lst
(1) From scratch. Easy: no boiler plate.
(2) Wizzard project.bra creates skeleton: description, program stub, facility to help you keep your code nicely formatted.
Principles:
- reasonable margins
- indented
- pair of parentheses aligned
(horizont|vertic)ally
- heading operator aligned w. parentheses
Some coding errors become conspicuous.
Use combination of terminal window or
DOS-prompt and auto-reloading text editor.
Edit your code, save, type !r
in Bracmat,
(auto)reload program in editor.
{?} (diet =. !arg:? (fish|meat|chicken) ? & carnivore |
{1} !arg:? (yoghurt|milk|cheese) ? & veggie
{1} | !arg:? (fruit|nuts) ? & vegan | dontknow
{1} )
{!} diet
S 0,00 sec
{?} lst$diet
(diet=
. !arg:? (fish|meat|chicken) ?
& carnivore
| !arg
: ? (yoghurt|milk|cheese) ?
& veggie
| !arg:? (fruit|nuts) ?&vegan
| dontknow);
{!} diet
C-compiler with standard libraries
32-bit or 64-bit OS bracmat.c and xml.c (< 20 000 lines in all) nothing more
E.g. compile and link with GNU-C: gcc -std=c99 -pedantic -Wall -O3 static -DNDEBUG -o bracmat bracmat.c xml.c
Windows, dynamically linked, 32 bit: 100Kb
Linux, statically linked, 64 bit: 1 Mb
w/0 parameters: interactive (REPL) each parameter is evaluated, left to right.
Bracmat as embedded s/w:
JNI – call Bracmat from Java
Linked with C or C++ code, Bracmat can be called and can even call C-functions.
Nodes
Very compact, 4,8,12,16, … bytes per node
Reference counting
Structure sharing
Unreferenced nodes freed
Variables dynamically scoped (lex.bra → lexical)
UTF8 assumed, fall back to ISO-8859-1
Ø ø in Unicode, UTF-8: utf$low$chu$216 → 248
Ø ø in ISO-8859-1: asc$low$chr$216 → 248 upp$ ру́ → РУ́
Robust and fast parsing, done in C.
Transformation to Bracmat expression.
No nesting of elements. Separate step.
XML/HTML entities → UTF-8 get$(" <p class='Q' enabled
>Samsø<br/</p> ",MEM,HT,ML)
→ (p.(class.Q) (enabled.)) Samsø
(br.,) (.p.)