2010-09-03

hunspell frustration

On Sep 3, 12:27 pm, Simon Brooke wrote:
> On Fri, 03 Sep 2010 05:58:06 -0700, Xah Lee wrote:
> > On Sep 3, 4:11 am, p...@informatimago.com (Pascal J. Bourguignon) wrote:
> >> Xah Lee writes:
> >> > Verdict: yay for Clojure!
>
> >> Whatever.
>
> >> But judging frmo your "how to get list of vectors with value from file
> >> content..."  question, one would think that after all the years you've
> >> been spending critisizing everyting about programmers and programs,
> >> you'd at least have some sound notions of programming, but you seem
> >> actually to lack even the most basic programming notions.
>
> > lol Pascal.
>
> > asking simple language questions is no indication of one's knowledge in
> > computer science nor expertise of the language.
>
> > Knuth, if he were to program in say java, lisp, javascript, php, or even
> > html, he probably would be a beginner.
>
> I really, really don't think so. Software has remarkably few basic
> concepts, and they have not changed over the years. Learning what's where
> in a library takes time, yes; minor differences of syntax between related
> languages can also take time to get used to. But there have been no
> fundamentally new innovations on software in the past forty years. Prolog
> and Smalltalk are probably the youngest really innovative languages.

Hi Simon,

i think you are just pushing out your pet opinion about languages havn't changed much. Not much to do with the thread.

Here's a collection of essays on software engineering.

〈The Tech Geekers and Software Engineering〉 http://xahlee.org/UnixResource_dir/paradigm.html

A subsection, is about 14 essays related to software complexities. Here are the essay titles:

* A Exhibition Of Tech Geekers Incompetence: Emacs whitespace-mode
* A Emacs Frustration (blogger package)
* Emacs's Menu Usability Problem
* Emacs Spell Checker Problems
* A Record of Frustration in IT Industry
* Hunspell Path Pain
* The Complexity And Tedium of Software Engineering
* Mac OS X SSH Session Disconnection
* Graphics Programing Pains
* Software Dependency Complexity: Fink, Unison
* URL Percent Encoding and Unicode
* URL Percent Encoding and Ampersand Char
* AutoHotkey Path Problem; Windows Shortcut Path


Here's one i wrote recently:
〈Hunspell Path Pain〉
http://xahlee.org/comp/hunspell_spell_path_pain.html

it shows how a trivial problem, not even about programing but just about using a software, ends up 3 or 4 hours. We, programers, spend all day, on this kinda things.

in recent years, occasionally i try to write up and document the pain and frustration as a programer, in a concrete way.

--------------------------------------------------
Hunspell Path Pain

Xah Lee, 2010-06-18

Am slightly frustrated with hunspell. Spend about 5 hours yesterday and today on it. Am trying to get it to work with emacs's speck-mode, on Windows.

Being a kinda thorough person, i started to work on this problem from the ground up, by first trying to read the doc, become familiar with its basic usages, syntax, and get it to run on the command line only. Once i am familar with it on command line, then i can move on to understand the integration and config issues with emacs and speck-mode, by getting it to work in my personal emacs setup. Then, i can move on to the next step, of working ErgoEmac's installation elisp config files. Great and careful master plan. (actually, i went one step more thorough than this, by first understanding aspell, of which, i did yesterday, the result is here: aspell Tutorial.)

So, first job is to get it to run on the command line.

The path to hunspell on my machine is at:

C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\
Chang path into the dir, then i can run:

hunspell -d en_US
Good. Now i need set the hunspell executable path in my env var PATH so i can run it elsewhere. Using Windows's cmd.exe, it's like this:

set PATH="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\;%PATH%"
Easy. This is just for the current session. I can set it permanently later using “setx” or PowerShell once i got all env var issues resolved. Now, i can call hunspell from elsewhere. However, now it can't find the dictionary file:

c:\Users\xah>hunspell
Can't open affix or dictionary files.

c:\Users\xah>hunspell -d en_US
Can't open affix or dictionary files.
This works:

c:\Users\xah>hunspell -d "C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\en_US"

Hunspell 1.2.8
^C
c:\Users\xah>
According to the man page, there are the 3 env vars. Here i quote the section of the man page as it is written:

DICTIONARY
Similar to -d.

DICPATH
Dictionary path.

WORDLIST
Equivalent to -p.
I spent 2 hours trying a combination of the following variations:

set DICTIONARY="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\en_US"
set DICPATH="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\en_US"

set DICTIONARY="en_US"
set DICPATH="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell"
set DICPATH="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\en_US"

set DICTIONARY=""
set DICPATH="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\en_US"

set DICTIONARY="C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell\en_US"
set DICPATH=""
Won't work. Also, i got fancy and imagined maybe the DICPATH in the manual is a typo, maybe it should be DICTPATH? No.

Its man page, in the tradition of unix, is the most fucking shit possible. The syntax of the command hunspell is also shit, like every unix command line program.

The “-d” option can be either like “en_US” or the full path to the file but sans the suffix “.dic” and “.aff”. WTF? (note that “en_US” involves 2 files, the dictionary file “en_US.dic” and the affix file “en_US.aff”.) Further, it can be a sequence of dictionaries, separated by comma, and seems like space after comma is not allowed. So what if you want to give multiple dictionaries with full path that has spaces in them? (it is probably impossible, to have a mathematically precise yet simple spec, on how the program takes the parameter.)

If you search the web, for the exact error message “"Can't open affix or dictionary files."”, there are 392 results, all over from debian, redhat, fedora etc. What kinda incompetent shit created this situation?

Also, the manual link at home page http://hunspell.sourceforge.net/ is a “404 Error – Page Not Found”. Typical of open source tech geeker's quality, of those elite programers so proud of calling themselfs the idiotic term “hackers”. Sure, they can write complex programs, but do these idiots have a minimal concept of quality? Maybe they are not very endowned in the department of writing documentation, understandable, but do they at least TRY to have anything working well? What possible incompetence, can explain that the big fat manual link on the home page being broken? How long has it been broken?

Eventually, it hit on me. Maybe the port to Windows is buggy, that it doesn't check session env var but only look in the registry? So, i did:

setx DICTIONARY "en_US"
setx DICPATH "C:\Program Files (x86)\ErgoEmacs 1.8.1\hunspell"
That solved it. It is when actually writing this rant at this point, i got it to work.

God, writing things out helps. Helps my anger, helps my thinking.

Now, is my incomplete understanding of Windows Environment Variables to blame, or is this a bug of hunspell Windows port?

PS Note that Windows shell is also a deeply layed baggage of shit. Note the syntax of “set” and “setx” are different, in fact “setx” is a hack added into cmd.exe, which itself is several evolution and reincarnation over the past 20 years. But that's another tale. Glad Microsoft has PowerShell bundled in Windows 7.

Xah ∑ http://xahlee.org/ ☄

No comments:

Post a Comment