youtube embed code invalid html


When going to YouTube, it gives a embed code such as

<iframe title="YouTube video player"
class="youtube-player" type="text/html" width="640" height="385"

Note that the


is not valid html. There's no such attribute for iframe tag.

could anyone explain why google put that? I guess it's for some practical reason, but i couldn't guess what.

PS you can get the embed code by going here http://www.youtube.com/watch?v=QRvVzaQ6i8A

i also posted this question on
but i hoping some google employee would us some good insight.

Xah Lee


emacs describe-char missing info on unicode thumb up char

there are these unicode symbols

ok hand sign 👌 #x1f44c

thumb up 👍 #x1f44d
thumb down 👎 #x1f44e

when calling describe-char on them, it doesn't give their names.

is this a bug? does it happen to just few chars, or perhaps all chars outside basic multilingual plane? I know that many chars outside of BMP doesn't have this problem.

Xah ∑ http://xahlee.org/


what's the de facto practice on ampersand encoding in html


by html spec, ampersand should be encoded as


but of course a lot web doesn't do that. Here's a example of ad widget from amazon:


note that the ampersand is not encoded.

My question is, for those work a lot with commercial sites, work in a company, or work with many widget codes, do most of these sites actually encode the ampersand?

(in other words, what percentage of top 1k sites try to encode ampersand properly when it is in url?)


elisp 23.2 doc on regex on multibyte char still correct?



in elisp doc for emacs 23.2, section on regex, it
it has a section that talks about multibyte chars.

is that info still correct?


This is edition 3.0 of the GNU Emacs Lisp Reference Manual,
corresponding to Emacs version 23.2.

(elisp) Regexp Special

The beginning and end of a range of multibyte characters must be in
the same character set (*note Character Sets::). Thus,
`"[\x8e0-\x97c]"' is invalid because character 0x8e0 (`a' with
grave accent) is in the Emacs character set for Latin-1 but the
character 0x97c (`u' with diaeresis) is in the Emacs character set
for Latin-2. (We use Lisp string syntax to write that example,
and a few others in the next few paragraphs, in order to include
hex escape sequences in them.)

If a range starts with a unibyte character C and ends with a
multibyte character C2, the range is divided into two parts: one
is `C..?\377', the other is `C1..C2', where C1 is the first
character of the charset to which C2 belongs.

You cannot always match all non-ASCII characters with the regular
expression `"[\200-\377]"'. This works when searching a unibyte
buffer or string (*note Text Representations::), but not in a
multibyte buffer or string, because many non-ASCII characters have
codes above octal 0377. However, the regular expression
`"[^\000-\177]"' does match all non-ASCII characters (see below
regarding `^'), in both multibyte and unibyte representations,
because only the ASCII characters are excluded.

A character alternative can also specify named character classes
(*note Char Classes::). This is a POSIX feature whose syntax is
`[:CLASS:]'. Using a character class is equivalent to mentioning
each of the characters in that class; but the latter is not
feasible in practice, since some classes include thousands of
different characters.


Xah ∑ http://xahlee.org/ ☄