Posts Tagged ‘python’

Django Testing

November 27, 2008

Like many a geek, I’m a lazy bastard, so it’s only recently I’ve gotten around to writing unit tests for my tiny website (I at least have the excuse of doing this only for jollies; I’m no professional web developer). I had vaguely assumed that writing (and tests) would be more trouble than it’s worth, and that it would be difficult to test Really Important Stuff anyway. I had also vaguely assumed that those vague assumptions were almost certainly wrong, and I was a contemptible fool for not having written the tests up front.

It turns out, no surprise, that the latter vague assumption was correct. Python’s doctest and unittest frameworks are already relatively simple, and the django testing framework makes them simpler still (there’s a bit of annoying boilerplate to figure out in python’s raw unittest, which the django framework thoughtfully hides). The setup for test databases is especially nice.

[Mind you, in my first attempt I did somehow manage to blow away my local copy of my database—not just the test database, but the real one. I never did figure that one out…]

Embarrassingly, one of the first tests I wrote turned up a bug. Not to be wondered at, I suppose.

One note, on the very slim chance someone finds it useful. The django test framework looks for tests in each app’s model and test modules. At first I was annoyed by this: I like to put doctests in other places, as they really belong in the docstrings of the functions and classes they test. But—and you already know this if you’re a better python programmer than I—there’s a trivial way around this. The unittest and doctest modules already coexist awfully well. doctest.DocTestSuite creates a TestCase from a module’s doctests, and this can be amalgamated with unittests by defining a function called suite in the test module, which django will look for. That is, put something like this in tests.py or tests/__init__.py:

def suite():
    # An easy way of finding all the unittests in this module
    suite = unittest.TestLoader().loadTestsFromName(__name__)
    for mod in myapp.views, myapp.forms:
        suite.addTest(doctest.DocTestSuite(mod))
    return suite
Advertisements

In which I finally learn what’s in a TrueType font

November 1, 2008

LMNOP Last week someone posted an interestingly bizarre problem in the LilyPond newsgroup: using Times New Roman on Vista, the letter N becomes “Ị.” Go figure. Debugging that seemed like a fun puzzle, so I looked into it a bit, and concluded that there was a bug in the font. Someone who knows more than I do diagnosed it more completely: it turns out that the ‘post’ table assigns the name ‘N’ to three different characters, confusing LilyPond (or pango, or freetype, or whatever). Microsoft already knows that, but have no plans to do anything, presumably because Microsoft software doesn’t use the post table, and Microsoft doesn’t care about any stinkin’ software other than their own.

For reasons that escape me, this was enough to inspire me to learn what’s inside a TrueType font. The format is, not surprisingly, both simple and and Byzantine. I’ve cobbled together a python program to fix the problem with MS’s TNR. In case anyone is curious, it’s below the fold. For heaven’s sake don’t assume it won’t ruin anything you run through it.

(more…)

A little Django flatpage trick

May 31, 2008

For each general area of my humble little website, I have a base template that takes care of a links bar and breadcrumbs and things. All the individual pages in that area extend that base.html, which in turn extends parent base templates. Nothing unusual there, and all well and good.

Except for flatpages. Flatpages are great, but out of the box they’re a tiny bit rigid. I want a flatpage in a particular area to extend the right base.html, but that’s not quite what the flatpage template setting provides—that’s a whole template, and I just want a base to extend. One could of course have a different flatpage template for each base.html, but that’s gross, and not very DRY.

Maybe there is some simple obvious way to do this built in to Django, but I couldn’t find it. My solution is to extract the appropriate base template from the flatpage itself. I put it in a django comment, which will get stripped out of the content before rendering. So the line

{# Base utilities/base.html #}

goes in the body of the flatpage, and gets extracted with a filter. [It might be better to infer it from the page’s url, if one’s directory structure and url structure always match.] The only problem is that I need the filter loaded before the extends tag, and the extends tag needs to come before anything else. It used to be possible to load before extending, but that was an evil (if useful) loophole, now closed.

Like all problems in computer science, this can be solved with another level of indirection. flatpages/default.html loads the filter and extracts the base template name, and then includes another template to do the actual rendering.

Here’s the code, simple and completely non-robust though it is. In a tempatetags/flatpage_utils.py or whatever you want to call it:

@register.filter
@stringfilter
def stripdjangocomments(text):
    """
    Strip django comments from the text.
    """
    s = re.sub(r'{#.*?#}', '', text)
    return s

@register.filter
@stringfilter
def getbase(text, default = "base.html"):
    """
    Look for a string of the form {# Base foo #} and return foo
    """
    m = re.search(r'{#\s*Base\s*(\S*?)\s*#}', text)
    if m and m.groups()[0]:
        return m.groups()[0]
    else:
        return default

In templates/flatpages/default.html

{% load flatpage_utils %}

{% with flatpage.content|getbase as pagebase %}
{% include "flatpages/flatpagebody.html" %}
{% endwith %}

And in templates/flatpages/flatpagebody.html

{% extends pagebase %}
{% load whatever_else %}

{% block title %}
{{ flatpage.title }}
{% endblock %}

{# maybe other stuff #}

{% block content %}
{# add more filters if you like #}
{{ flatpage.content|stripdjangocomments }}
{% endblock %}

And that’s it.

Unicode, Browsers, Python, and Kvetching

May 28, 2008

My HTML/unicode character utility is now in a reasonably usable state. I ended up devoting rather more effort to it than I had originally planned, especially given that there are other perfectly useful such things out there. But once you start tweaking, it’s hard to stop. There are now many wonderful subtleties there that no one but me will ever notice.

What gave me the most grief was handling characters outside the Basic Multilingual Plane, i.e. those with codes above 0xFFFF. That’s hardly surprising. And I suppose it shouldn’t be surprising that browsers handle them so inconsistently. All the four major browsers try to display BMP characters using whatever fonts are installed, but not so for the higher ones. In detail:

  • Firefox makes a valiant effort to display them, using whatever installed fonts it can find. It’s fairly inconsistent about which ones it uses, though.
  • IE7 and Opera make no effort to find fonts with the appropriate characters. They do work if you specify an appropriate font.
  • Safari (on Windows) doesn’t display them even if you specify a font. This does not further endear Safari to me.

Oh, and on a couple of XP machines I had to reinstall Cambria Math (really useful for, you know, math) to get the browsers to find it. There must be something odd about how the Office 2007 compatibility pack installed its fonts the first time (I assume that’s how they got there).

On the server side, I knew I would have to do some surrogate-pair processing myself, and that didn’t bother me. Finding character names and the like was more annoying. I was delighted with python’s unicodedata library until I started trying to get the supplementary planes to work. The library restricts itself to the BMP, presumably because python unicode strings have 16-bit characters. The reason for the restriction is somewhat obscure to me—the library’s functions could presumably work either with single characters or surrogate pairs; and I’m pretty sure all the data is actually there (the \N{} for string literals works for supplementary-plane characters, for example).

The whole unicode range ought to work in wide builds of python, but I have no idea if that would work with Django and apache/mod_python and Webfaction, and I’m far too lazy to try. So I processed the raw unicode data into my own half-assed extended unicode library, basically just a ginormous dict with a couple of functions to extract what I want (so far just names, categories and things to come if I ever get around to it).

Some AJAX in Django

May 11, 2008

Months ago I started looking into doing AJAXy things within Django, and (typically for me) never actually did any of them. Finally I’ve started looking at that again. My needs are simple and dull: I just wanted quick and seamless responses to changes in form data in the little utilities I just added to my website.

Now I know very little about Ruby on Rails, but some of what I’ve seen if does look kinda cool. In particular I liked the respond_to gadget, which switches on requested mimetypes to figure out what response to send from a view (or action, or whatever they’re called on Rails). That seems to allow nice code factoring with minimal syntax, in a way that’s concise and clever (typical for Ruby) and clear (not so typical, IMO…).

I’m not convinced this is a truly great idea, for reasons I’ll detail below, but what the hey, when did that ever stop anyone? So I hacked up a python/Django analogue (see the end of the post). I may have course have misunderstood completely what’s up with the Ruby thing, in which case, oh well.

Here’s an example of how you use this thing—a Responder object—in a view:

def index(request):
    data = { 'foo' : 'bar', 'this' : 'that' }
    responder = Responder(request, 'template', data)
            { 'raw' : raw, 'types' : types })

    responder.html

    responder.js

    return responder.response()

This says, more or less “If the request wants HTML, render the data with the template template.html. If it wants javascript, render with the template template.js and the javascript mimetype.” That is, it’s something like

def index(request):
    data = { 'foo' : 'bar', 'this' : 'that' }

    if <wants html>:
        return render_to_response('template.html', data)

    if <wants js>:
        return render_to_response('template.js', data,
            mimetype='text/javascript' )

    return responder.response()

[where that <wants html/javascript> conceals some complexity…]

The render-a-template behavior can be overridden: those hacky ‘html’ and ‘js’ attributes are callable. If one of them is passed a function, it calls it: if the function returns something, that something is used as the response. It can also modify data and return None to proceed with default handling. Here’s an example I used when testing this stuff on different browsers. It prints the contents of the HTTP_ACCEPT header, and provides a button to fire an ajax request to replace that. In this case I built the javascript messily by hand.

def index(request):
    raw = request.META['HTTP_ACCEPT']
    types = parseAccept(request)

    responder = Responder(request, 'index.html',
            { 'raw' : raw, 'types' : types })

    responder.html

    @responder.js
    def jsresp(*args, **kwargs):
        text = raw + '<br><br>' + \
            '<br>'.join('%s %g' %(s, q) for s,q in types)
        js = "$('content').update('%s');" % text
        return HttpResponse(js, mimetype='text/javascript')

    return responder.response()

Here’s the corresponding template (which uses prototype):

<script src="/static/js/scriptaculous-js-1.8.1/lib/prototype.js" type="text/javascript"></script>
<script type="text/javascript">
    function ajaxUpdate () {
        headers = { Accept : 'text/javascript;q=1, */*;q=0.1' };
        if(Prototype.Browser.Opera)
        {
            headers.Accept += ',opera/hack'
        }

        new Ajax.Request('/',
            { method:'get', parameters: {}, requestHeaders : headers } );
    };
</script>

<div id="content">
    {{ raw }}<br><br>
    {% for s in types %}
    {{ s.0 }} {{ s.1 }}<br>
    {% endfor %}
</div>
<div>
    <br>
    <input id="b1" onclick="ajaxUpdate();" type="button" value="Click Me!">
    </input>
</div>

So What Have I Learned From This? Well, it all seems to work, so I’ll keep using it. But I’m not totally sold that this—switching on HTTP_ACCEPT, and my own particular implementation—is the Right Way to do things.

Philosophically, the general idea seems awfully prone to abuse. As I understand RESTful web services (i.e. not very well), different requests correspond to different representations of the same underlying data. But are the original html and the javascript that updates it really different representations of the same thing, or different animals altogether? I think that’s a murky point, at best. And I should think that in real life situations it could get messy. What happens, for example, if there is more than one sort of javascript request (e.g. if there are different forms on a page that do fundamentally different things)?

Rails and REST fans, please set me straight here!

Practically, the HTTP_ACCEPT thing seems delicate. I had to futz around a bit to get it to work in a way I felt at all confident of. Browsers seem to have different opinions about what they should ask for. Oddly, the browser that caused me the most problems was Opera—despite what I told prototype’s AJAX request, Opera insisted on concatenating the ACCEPTed mimetypes with the original request’s mimetypes. I hacked around that by throwing in a fake mimetype to separate the requests I wanted from those Opera wants; see the template above and the code below.

So anyway, maybe it would be better, or at least more Django, to be explicit about these AJAX requests, and either give them different URLs (and factor common code out of the various views) or add a piece of get/post data, as here. For now I’ll keep doing what I’m doing, and see if I run into problems.

Here’s the Responder code. It has numerous shortcomings, so use at your own risk. It is completely non-bulletproof (and non-debugged), and won’t work if you don’t use it just like I wanted to use it (e.g. you’d better give it a template name). It obviously needs more mimetype knowledge—it falls back on python’s mimetype library, but that seems seriously unacceptable here. And I’m very lame about how I parse the HTTP_ACCEPT strings.

import sys
import re
import os, os.path, mimetypes
import django
from django.http import HttpResponse
from django.shortcuts import render_to_response
from django.template import RequestContext

class _ResponseHelper(object):
    def __init__(self, ext, mimetypes, responder):
        self.responder = responder
        self.ext = ext
        self.mimetypes = mimetypes
        self.fn = None

    def __call__(self, fn=None):
        self.fn = fn
        return fn

class Responder(object):
    """
    Utility for 'RESTful' responses based on requested mimitypes,
    in the request's HTTP_ACCEPT field, a la Reils' respond_to.

    To use, create a responder object.  Pass it the request object
    and the same arguments you would pass to render_to_response.
    Omit the file extension from the template name---it will be added
    automatically.
    For each type to be responded to, reference an attribute of the
    appropriate name (html, js, etc).
    Call the respond function to create a response.
    The response will be created by appending the extension to the filename
    and rendering to response, with the appropriate mimetype.

    To override the default behavior for a given type, treat its
    attribute as a function, and pass a function to it.
    It will be called with the same arguments as the Responder's constructor.
    If the function can modify the passed data, and either return None
    (in which case the template handling proceeds), or return a response.
    Function decorater syntax is a convenient way to do this.

    Example:

        responder = Responder(request, 'mytemplate', { 'foo': 'bar' })

        responder.html

        @responder.json
        def jsonresp(request, templ, data):
            data['foo' : 'baz']

        @responder.js
        def jsresp(request, templ, data):
            return HttpResponse(someJavascript,
                mimetype='application/javascript')

        return responder.response()

    Here an html request is processed as usual.
    A JSON request is processed with changed data.
    A JS request has its own response.

    """
    types = { 'html' : ('text/html',),
              'js' : ('text/javascript',
                      'application/javascript',
                      'application/x-javascript'),
              'json' : ('application/json',),
            }

    def __init__(self, request, *args, **kwargs):
        self.request = request
        self.resp = None
        self.args = [a for a in args]
        self.kwargs = kwargs
        self.priorities = {}
        for t, q in parseAccept(request):
            self.priorities.setdefault(t, q)
        self.defq = self.priorities.get('*/*', 0.0)
        self.bestq = 0.0

    def maybeadd(self, resp):
        try:
            thisq = self.bestq
            for mt in resp.mimetypes:
                q = self.priorities.get(mt, self.defq)
                if q > thisq:
                    resp.mimetype = mt
                    self.resp = resp
                    self.bestq = q
        except:
            pass

    def response(self):
        if self.resp:
            if self.resp.fn:
                result = self.resp.fn(self.request, *self.args, **self.kwargs)
                if result:
                    return result

            # the template name ought to be the first argument
            templ = self.args[0]
            base, ext = os.path.splitext(templ)
            if not ext:
                templ = "%s.%s" % (base, self.resp.ext)
            self.args[0] = templ
            self.kwargs['mimetype'] = self.resp.mimetype
        # if there wasn't a response, default to here
        response = render_to_response(
                context_instance=RequestContext(self.request),
                *self.args, **self.kwargs)
        return response

    def __getattr__(self, attr):
        mtypes = None
        if attr not in self.types:
            mtypes = [mt for mt, enc in [mimetypes.guess_type('.'+attr)]
                        if mt]
        else:
            mtypes = self.types[attr]
        if mtypes:
            resp = _ResponseHelper(attr, mtypes, self)
            self.maybeadd(resp)
            return resp
        else:
            return None

def parseAccept(request):
    """
    Turn a request's HTTP_ACCEPT string into a list
    of mimetype/priority pairs.
    Includes a hack to work around an Opera weirdness.
    """
    strings = request.META['HTTP_ACCEPT'].split(',')
    r = re.compile(r'(.*?)(?:\s*;\s*q\s*\=\s*(.*))?$')
    types = []
    for s in strings:
        m = r.match(s)
        q = float(m.groups()[1]) if m.groups()[1] else 1.0
        t = m.groups()[0].strip()
        if t == 'opera/hack':
            break
        types.append((t, q))
    return types

Some HTML text utilities

May 8, 2008

I’ve just added some utilities to my website:

  • A converter that takes HTML entities to/from the characters they represent. The input can be actual characters (e.g þ), named entity references (&thorn;), or numeric references, decimal or hexadecimal. It also accepts some abbreviations (two back-ticks for “, for example), which I’ll eventually document. Maybe.
  • Lorem Ipsum text, with settable font, font size, and line height.
  • A list of named HTML entities. (Yes, that’s easy to find, but I wanted a place I could get to easily.)

[Yes, I know all these things are easily available many places. I wanted to be able to get to them without having to think about it, and to be able to fiddle with the details.]

I wrote these for a target user base of one—me—so there’s no particular reason to think that they’ll be useful for anyone else. They’re also in a bit of a raw and unfinished state (in which they’ll stay until I get around to doing something about it). But hey, use them if you like.

All the entity names and unicode descriptions come from the python unicodedata and htmlentitydefs libraries. I love the way python includes stuff like that.

Django QuerysetRefactor

April 27, 2008

In major Django news, Malcolm Tredinnick‘s long-awaited QuerysetRefactor branch is in for real; huzzah!  This has little immediate impact on my tiny site.   It did allow (and require, as I expected) me to remove the QLeftOuterJoin workaround from Django Snippets I used in a couple of places.  It also fixes other problems I’ve run into before—with ordering across relations, for example—and looks to be a major nicification in general.    I’m very impressed that so major an internals change could be done with so few backwards incompatibilities.

Ruby-like expression substitution in Python

February 28, 2008

I don’t know much Ruby, and probably won’t learn; all that syntax and magic scare me away. But I have to admit it has some darned useful gadgets. Here’s a python function I hacked up to do something much like Ruby’s expression-substitution, using the same #{ } syntax. It doesn’t allow curly braces inside the #{ }; were I a little less lazy I would put in some escaping.


import re
import sys

def esub(s):
    """
    Perform Ruby-like expression substitution.

    >>> x=3
    >>> y='A'
    >>> esub('abc#{x}def#{3+5}hij#{"".join([y, y])}')
    'abc3def8hijAA'
    """
    restr = r'(?:#{(?P[^{}]*)})|(?:[^#])+|#'
    fr = sys._getframe(1)
    def process(m):
        txt = m.group('exp')
        if txt is not None:
            val = eval(txt, fr.f_globals, fr.f_locals)
            return type(s)(val)
        else:
            return m.group()
    return ''.join(process(m) for m in re.finditer(restr, s))

Spellchecking in python

January 7, 2008

FWIW, here’s the script I threw together to extract the wordlist I mentioned in the previous post:

#! /usr/bin/env python2.5

from __future__ import with_statement

import os
import re
import sys

from optparse import OptionParser

def worditer(wordsin, dict = None):
    r = re.compile(r'[#&]')
    if dict:
        cmd = 'aspell -a --lang=%s' % dict
    else:
        cmd = 'aspell -a'
    i, o = os.popen2(cmd)
    # skip first line
    o.readline()
    for w in wordsin:
        if w:
            i.write(w + '\n')
            i.flush()
            result = o.readline()
            if result and result != '\n':
                o.readline()
                if r.match(result):
                    # add the word for this session
                    i.write('@%s\n' % w)
                    yield w

def dowords(wordsin, outstr, dict):
    for w in worditer(wordsin, dict):
        outstr.write(w + '\n')

def filewordsiter(filenames):
    regex = re.compile(r'\W*')
    for fname in filenames:
        with open(fname) as f:
            for line in f:
                for w in regex.split(line):
                    yield w

def dofiles(filenames, outstream, dict):
    dowords(filewordsiter(filenames), outstream, dict)

def main():
    parser = OptionParser()
    parser.add_option('-d', '--dict', dest = 'dict',
                        help = 'Dictionary to use')
    parser.add_option('-o', '--out', dest = 'outfile',
                        help = 'Output file, stdout if none')
    options, filenames = parser.parse_args()
    if options.outfile:
        outstr = open(options.outfile, "w")
    else:
        outstr = sys.stdout
    dofiles(filenames, outstr, options.dict)

if __name__ == '__main__':
    main()

Python and the clipboard

December 31, 2007

I liked the idea (found via Alex) of modifying clipboard data in place, and decided to steal it for my own use, in python, on win32 and ubuntu. Since it was surprisingly annoying to figure out how to use the clipboard in either case, I’ll post what I ended up with, so that I can find it again myself.

This works with gtk or win32; the latter requires pywin32:

#! /usr/bin/env python2.5

from __future__ import with_statement
from contextlib import contextmanager

try:
    import win32clipboard as wcb
    import win32con

    @contextmanager
    def WinClipboard():
        """
        A context manager for using the windows clipboard safely.
        """
        try:
            wcb.OpenClipboard()
            yield
        finally:
            wcb.CloseClipboard()

    def getcbtext():
        with WinClipboard():
            return wcb.GetClipboardData(win32con.CF_TEXT)

    def setcbtext(text):
        with WinClipboard():
            wcb.EmptyClipboard()
            wcb.SetClipboardText(text)

except ImportError, e:
    # try gtk.  If that doesn't work, just let the exception go
    import gtk

    def getcbtext():
        return gtk.Clipboard().wait_for_text()

    def setcbtext(text):
        cb = gtk.Clipboard()
        cb.set_text(text)
        cb.store()

def replaceclipboard(fn):
    """
    Modify text on the clipboard.

    fn: a callable object that maps strings to strings.

    >>> setcbtext("This is some text.")
    >>> replaceclipboard(lambda s : s.upper())
    >>> getcbtext()
    'THIS IS SOME TEXT.'
    """
    text = getcbtext()
    newtext = fn(text)
    setcbtext(newtext)

def _test():
    import doctest
    doctest.testmod()

if __name__ == '__main__':
    _test()