Archive for January 7th, 2008

Spellchecking in python

January 7, 2008

FWIW, here’s the script I threw together to extract the wordlist I mentioned in the previous post:

#! /usr/bin/env python2.5

from __future__ import with_statement

import os
import re
import sys

from optparse import OptionParser

def worditer(wordsin, dict = None):
    r = re.compile(r'[#&]')
    if dict:
        cmd = 'aspell -a --lang=%s' % dict
    else:
        cmd = 'aspell -a'
    i, o = os.popen2(cmd)
    # skip first line
    o.readline()
    for w in wordsin:
        if w:
            i.write(w + '\n')
            i.flush()
            result = o.readline()
            if result and result != '\n':
                o.readline()
                if r.match(result):
                    # add the word for this session
                    i.write('@%s\n' % w)
                    yield w

def dowords(wordsin, outstr, dict):
    for w in worditer(wordsin, dict):
        outstr.write(w + '\n')

def filewordsiter(filenames):
    regex = re.compile(r'\W*')
    for fname in filenames:
        with open(fname) as f:
            for line in f:
                for w in regex.split(line):
                    yield w

def dofiles(filenames, outstream, dict):
    dowords(filewordsiter(filenames), outstream, dict)

def main():
    parser = OptionParser()
    parser.add_option('-d', '--dict', dest = 'dict',
                        help = 'Dictionary to use')
    parser.add_option('-o', '--out', dest = 'outfile',
                        help = 'Output file, stdout if none')
    options, filenames = parser.parse_args()
    if options.outfile:
        outstr = open(options.outfile, "w")
    else:
        outstr = sys.stdout
    dofiles(filenames, outstr, options.dict)

if __name__ == '__main__':
    main()

Precious Bane

January 7, 2008

<MILD SPOILER ALERT>

I’ve now read Precious Bane, and as I mentioned before, I’m shocked to find that I loved it. It’s not what I generally think of as my cup of tea. I like historical novels as much as the next guy, if not more, but I tend to prefer them less rustic and more ironic and subversive. At the very least I like a sea battle or two. Subjects like, oh, illiterate farmers destroyed by singleminded obsessive ambition, and their harelipped sisters finding love through adversity, tend not to interest me so much.

Obviously what makes PB so riveting is the writing. I have no idea whether Mary Webb got early-19th-century Shropshire dialect right, but it hardly matters, it’s completely convincing to early-21st-century me. PB sucks you in linguistically the way some of Anthony Burgess‘s books (A Clockwork Orange, A Dead Man in Deptford) do. [Good Lord, I just compared Mrs Webb to Anthony Burgess, and Precious Bane to A Clockwork Orange; have I lost my senses?]

Atmosphere is more than dialect, of course, and there’s much else right in PB‘s atmosphere. I particularly liked the almost-medieval worldview of the Shropshire yeomen, steeped equally in the Bible and in old country superstition.

Yes, I can see how the whole thing cries out for parody, but really, the best subjects for parody are often great in themselves.

Addendum number one: the BBC made a movie of Precious Bane starring what must be a very well cast young Clive Owen and Janet McTeer; unfortunately, despite having been shown on Masterpiece Theatre it seems not to be available here.

Addendum number two: I ran the text of PB through a spell-checker; the results (unfiltered for proper names, mild variant spellings, and the like, and with no thought given to contractions and the occasional funky diacritic) are here.

Django, AJAX, Scriptaculous

January 7, 2008

I looked a little at Scriptaculous this weekend.  I like it–it seems easy to use, and nice and concise.  It’s more focussed on glitzy effects (and I mean that in a good way) and less on widgets than the much larger DOJO; it also struck me (quite possibly wrongly, I’m just at the poking around stage) as easier to figure out and use.  Both effects and AJAX requests are wrapped pretty nicely; you have to write very little code to use them.

Unlike DOJO it does pollute the global js namespace a bit (as mentioned here); that might be annoying for people planning to use more of their own js than I’m ever likely to.

Here‘s what appears to be a nice scriptaculous-in-django tutorial, not that I’ve gone through it in any detail.

There seems to have been much debate in Django circles about whether to “include AJAX” in Django, which would (I assume) mean bundling a toolkit and providing a set of tags that wrap it, a la Rails and Scriptaculous.  The Django developers are reluctant to do that, not wanting to commit to a toolkit, and pointing out that there’s really not that much to wrap.  Now that I at least know what they’re talking about, I can sorta see their point.  Which doesn’t mean a some rails-like toolkit-wrapping tags wouldn’t be nice.
I’m working (in a slow and desultory fashion) on something where I might actually make some use of this stuff; so maybe I’ll be able to form an informed opinion in a few weeks.