languages, regular syntax, syntax tree


syntax that represent a tree purely [was X#]

On Jan 21, 3:13 am, Pascal Costanza wrote:
> LOL: http://www.xsharp.org/samples/

Today, i was nosing about some blogs, which made me come to:

in which he wrote: «Now I’m not a LISP expert, but it seems to me that the S-Expression is generally regarded as the canonical representation of the LISP AST (and many other languages for that matter). That is, the LISP syntax is as close to an AST as one can get. Perhaps the presence of macros, #, `, ‘, and , muddy the waters a bit, but not much.»

that is technically not true. Although you mentioned lisp syntax irregularities like “# , ,@ '”, but in general this sentiment that lisp has regular syntax, or that its syntax is closest to abstract syntax tree, is perpetuating a myth.

Mathematica language's syntax, XML, and XML derived general purpose languages (e.g. O:XML), are in fact more regular and closest to a pure tree representation of the abstract syntax tree.

For detail, see:

• Fundamental Problems of Lisp
(the first section; on syntax irregularity)

• The Concepts and Confusions of Prefix, Infix, Postfix and Fully Nested Notations

also note, nesting matching delimiters is not the only way to represent a tree. Indentation, such as used in Python or ascii-picture tree used to represent directories and sub dirs and files, is another form, arguably has more clarity than nesting with delimiters. As far as i known, there were a couple such proposal/spec/library with working implementation of this syntax that works for Scheme lisp.

in general, the question is what are ways to represent a tree structure visually using text. The lisp, Mathematica approach is nesting matching pairs. Ascii-picture approch is using lines to represent a tree node and indentation to represent their level. (python's syntax borrows this idea, however not in a pure form, in the sense that the source code's visual representation does not correspond to python's AST, far from it)

Note here, one of the important aspect here is VISUALLY. Because otherwise any syntax is abstract syntax tree by definition. This criterion can be formalized mathematically, by saying that the parser for such a syntax should be very simple as having just one heuristics that is based on recursion. (i dont have expertise in parsing, but in parser jargon, i think there are simple terms to describe this class of syntax, something like: its lexical grammar can be parsed by parsing expression grammar)

Another important aspect here is that the source code of lang with such syntax also needs to remain human readable and practically usable for programers to type conveniently. This is the reason, lisp has few irregular syntax introduced. Mathematica, take the approach of a layer of syntax on top of the regular. This in my opinion has 2 major advantages: A: the regularity of the syntax is unadulterated. B: arbitrary shorter, traditional, or algebraic syntax can be introduced. This is also why, python's syntax isn't a pure representation of its abstract syntax tree, else, to code “print 3+4*5” becomes:


As for XML derived languages, the problem of verbosity and nesting largely remains. The practiced solution seems to head towards a specialized editor so that programers work on the editor's simpler representation instead of the source code. (e.g. much of MathML, and conversely Microsoft Word with its XML file format)

∑ http://xahlee.org/


No comments:

Post a Comment