html6, your html5/xml simplified


On Sep 21, 10:49 pm, Alex Shinn wrote:
> On Sep 22, 1:46 am, Xah Lee wrote:
> > On Sep 21, 9:42 am, Xah Lee wrote:
> > > can anyone point me to some popular xml/sexp transformation util?
> > > btw, is there some common/standard sexp syntax for xml? by that i mean
> > > perhaps some that is named, well defined, or most popularly used in
> > > scheme community. (i vaguelly recall scheme community i think plt
> > > scheme has such a format)
> > > basically ,am interested in a sexp syntax for writing xml.
> > found thishttp://okmij.org/ftp/Scheme/SXML.html
> > is this basically the one or are there major alternatives?
> It's by far the most widely ported Scheme XML library.
> Other libraries often use the same sxml format (including
> at least HTML and MIME parsers I've seen).
> Unfortunately, the format uses
>   (@ (name "value") ...)
> syntax for attributes, which has two disadvantages:
>   1. '@ is not a valid RnRS identifier.
>   2. The attributes are not dotted pairs, which can
>   be less efficient and doesn't play well with standard
>   alist utilities.
> I've seen a proposal to address the first issue by
> replacing "@" with "^".
> Alternately, since XML doesn't allow nested structures
> without a tag, the attributes could just be distinguished
> by their lack of a tag:  ((name "value") ...).
> --
> Alex

thanks Alex Shinn & Alexander Burger.

recently i got tired of w3c and now html5 wtf-group meandering attitude about html/xhtml/html5 in the past decade.

(see: 〈(Google Earth) KML Validation Fuckup〉http://xahlee.org/comp/kml_validation.html )

Who Listens to Correctness When Authorities Meander?


When XML and XHTML came alone in about 2000 with massive fanfare, we are told that XHTML will change society, or, at least, make the web correct and valid and far more easier to develop and flexible. Now it's a decade later. Sure the web has improved, but as far as html/xhtml and browser rendering goes, it's still a fuck soup with extreme complexities. 99.99% of web pages are still not valid. Major browsers still don't agree on their rendering behavior. Web dev is actually far more complex than before, involving tens or hundreds of tech that hardly a person even knows about. It's hard to say if it is better at all than the HTML3 days with “font” and “table” tags and gazillion tricks. The best practical approach is still trial n error with browsers.

And, now HTML5 comes alone, from a newfangled hip group, with a attitude that validation is overrated — a flying fuck to the face about the XML mantra from standards bodies, just when there starts to be more and more sites with correct XHTML.

XML is break from SGML, with many justifications why it needs be, and now HTML5 is a break from both SGML and XML. WTFML anyone?

so i've been thinking of starting a radical proposal.

• 〈HTML6, Your HTML/XML Simplified〉

here's the draft
HTML6, Your HTML/XML Simplified

Xah Lee, 2010-09-21

Tired of the standard bodies telling us what to do and change their altitude? Tired of the SGML/HTML/XML/XHTML/HTML5 changes? Tire no more, here's a new proposal that will make life easier.

Introducing HTML6

HTML6 is based on HTML5, XML, and a rectified LISP syntax. More specifically, it is derived from existing work on this, the SXML. http://okmij.org/ftp/Scheme/SXML.html, except that there is complete regularity at syntax level, and is not considered or compatible with lisp readers. The syntax can be specified by 3 short lines of parsing expression grammar.

The aim is far more simpler syntax, 100% regularity, and leaner. but with a far simpler, and more strict, format.

First of all, no error is accepted, ever. If a source code has incorrect syntax, that page is not displayed.


Here's a standard ATOM webfeed XML file.

<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:base="http://xahlee.org/emacs/">

<title>Xah's Emacs Blog</title>
<subtitle>Emacs, Emacs, Emacs</subtitle>
<link rel="self" href="http://xahlee.org/emacs/blog.xml"/>
<link rel="alternate" href="http://xahlee.org/emacs/blog.html"/>
<name>Xah Lee</name>
<rights>© 2009, 2010 Xah Lee</rights>

<title>Using Emacs's Abbrev Mode for Abbreviation</title>
<link rel="alternate" href="http://xahlee.org/emacs/emacs_abbrev_mode.html"/>

Here's how it looks like in html6:

〔?xml 「version “1.0” encoding “utf-8”」〕
〔feed 「xmlns “http://www.w3.org/2005/Atom” xml:base “http://xahlee.org/emacs/”」

〔title Xah's Emacs Blog〕
〔subtitle Emacs, Emacs, Emacs〕
〔link 「rel “self” href “http://xahlee.org/emacs/blog.xml”」〕
〔link 「rel “alternate” href “http://xahlee.org/emacs/blog.html”」〕
〔updated 2010-09-19T14:53:08-07:00〕
〔name Xah Lee〕
〔uri http://xahlee.org/〕

〔id http://xahlee.org/emacs/blog.html〕
〔icon http://xahlee.org/ics/sum.png〕
〔rights © 2009, 2010 Xah Lee〕

〔title Using Emacs's Abbrev Mode for Abbreviation〕
〔id tag:xahlee.org,2010-09-19:215308〕
〔updated 2010-09-19T14:53:08-07:00〕
〔summary tutorial〕
〔link 「rel “alternate” href “http://xahlee.org/emacs/emacs_abbrev_mode.html”」〕

Simple Matching Pairs For Tag Delimiters

The standard xml markup bracket is simplified using simple lisp style matching pairs. For example, this code:

Is written as:

〔h1 HTML6〕
The delimiter used is:

Character Unicode Code Point Unicode Name
XML Properties and Attributes Syntax

In xml:

<h1 id="xyz" class="abc">HTML6</h1>
In html6:

〔h1「id “xyz” class “abc”」HTML6〕
The attributes are specified by matching corner brackets. Items inside are a sequence of pairs. The value must be quoted by curly double quotes.

Escape Mechanisms

To include the 〔tortoise shell〕 delimiters in data, use “〔” and “〕”, similarly for the 「corner brackets」.

Unicode; No More CD Data and Entities “&”

There's no Entities. Except the unicode in hexadecimal format “&#x‹unicode code point hexidecimal›”.

For example, “&” is not allowed.

Treatment of Whitespace

Basically identical to XML.

Char Encoding; UTF8 and UTF16 Only

Source code must be UTF8 or UTF16, only. Nothing else.

File Name Extension

File name extension is “.xml6”.


got so tired of the w3c and wtf-group “standard bodies” of their continuous changing attitude about what html/xhtml should be, so i cooked up this.

it'd be nice if we just adopt sxml, but the various lisp ones i found has problems in that they ignore htm/xml as a syntax by itself, instead, the lispers just wanted lisp compatible with lisp readers, not really a standalone syntax.

the lisp's sexp syntax has a bunch of problems, foremost is that its not regular. (which the xml/xhtml movement fixed to html to some degree, but is now thwarted by the new html5 wtf-group with google and apple backing, nay, mostly just google + apple.)

• 〈Fundamental Problems of Lisp〉

also, the xml as textual representation of a tree has a quirk, in that each node has this special thing called “properties” or “attribute”, that are not a node/branch, but rather, are info attached to a node. The standard sexp to representation for this is inconsistent, e.g.

(tagX :probA aValue :propB bValue ...)

without changing the syntax, the above is like this:

(a b c d e ...)

which means that b c d e are actually nodes.

another way to represent xml's attribute, i think is from sxml as shown in Alex and Alexander's messages:

(a ((x . "1") (y . "2")) (b NIL (c ((y . "2")) "Mumble")) (d))

(@ (name "value") ...)

both have the same issue. That is, there's no syntax level distinction of what's a node and what's a node's property.

e.g. in this

(a ((x . "1") (y . "2")) (b NIL (c ((y . "2")) "Mumble")) (d))

the ((x . "1") (y . "2")) can be interpreted as a node by itself, where the first element is again a node. But also here, it uses lisp's special con syntax (x . "1") which is itself ambiguous at the syntax level. e.g. it can be considered as a node named x with 2 branches “.” and “"1"”, or it can be considered as a node named “cons” with 2 branches “x” and “"1"”.

in this:

(@ (name "value") ...)

again, this whole thing at the syntax level is simply a node named “@”. Only at the semantic level, that it is taken as properties of a node by the special head “@”.

So, in conceving html6, i thought a solution for getting rid of syntax ambiguity for node vs attributes is to use a special bracket for properties/attributes of a node. e.g.

In xml:


In html6:

〔h1「id “xyz” class “abc”」HTML6〕

one thing about this html6 is that it is intentionally separate as being a sexp in the lisp world. The key is that the syntax is designed specifically as a 2d textual representation of a tree, and with a attribute quote that attachs a limited form of info (pairs sequence) to any node to fit existing xml.

the advantage of this is that it should be extremely easy to parse, in perhaps just 3 lines of parsing expression grammar. And can be easily done in perl, python, ruby... without entailing lisp quirks.

any thoughts about flaws?

it's just a personal fantasy. ☺

Xah ∑ xahlee.org ☄

No comments:

Post a Comment