Archive for the ‘Programming’ Category


November 13, 2011

I’ve been looking at node.js, the server-side (more general than that, really) javascript execution environment. My gut reaction is that I like it–I like it a lot. But I think it will take some serious investigation to determine whether it’s ready for an industrial-strength application.

Now some details, aimed at node-n00bs such as myself. Anyone who would like to point out how badly I’ve gotten things wrong, feel free!

First, it’s important to know that node.js is architected to solve a specific problem, to wit, scalability in the presence of lots of concurrent access. The usual way something like apache/php handles concurrency is to spawn a thread for each server request. But threads have overhead, and there’s only so far you can push that before you have to buy more servers.

node.js has pretty much the opposite philosophy. The buzzwords you see are “asynchronous” and “event-driven” or just “evented”—its central element is a single-threaded event loop. But that doesn’t tell you much about why it’s a good idea. I found a much more revealing tagline here: “everything runs in parallel, except your code.”

The idea is that in a typical (non-trivial) server request most of the processing time is taken up in things like database or filesystem access, henceforth referred to generically (and not always completely correctly) as “IO.” Those are things that either don’t take a lot of CPU cycles, or at least are already in their own threads or processes. If the thread running those IO operations waits for them to complete, it will be sitting idle; in a single-threaded event loop that means it will block anything waiting for it. So in a single-threaded event loop, you don’t wait! IO operations in node.js don’t return their data directly; instead, they accept callbacks to process the results when they’re ready. Those callbacks are themselves processed in the event loop.

The callback functions themselves should all be lightweight, so that tens of thousands of them can be executed per second. They palm off all the hard work to “IO functions,” black boxes that may in turn add more callbacks. The node.js API, and good node.js plugin modules, are structured to make it positively difficult to do anything that blocks.

Here’s a “hello world” webserver, straight from the node.js front page, that responds to any request with, er, “hello world”:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, "");

The guts of that there–the argument to http.createServer–is the callback that gets called from the event loop whenever the server fires the “request” event (you don’t see the event loop yourself; you just add callbacks for events, and they are called from the event loop). It does the actual responding by calling methods on res, a ServerResponse object.

Of course a real response will be more complicated (starting with url-parsing, which I’ll ignore completely). Traditionally that might look something like

http.createServer(function (req, res) {
  var data = do_some_io_operation(req);
  var output = do_some_more_processing(data);
  res.writeHead(200, {'Content-Type': 'text/html'});
}).listen(1337, "");

But those two function calls, if they do anything IO-ish or otherwise nontrivial, are not kosher node.js. This should be something like this:

http.createServer(function (req, res) {
  do_some_io_operation(req, function(data) {
    do_some_more_processing(data, function(output) {
      res.writeHead(200, {'Content-Type': 'text/html'});
}).listen(1337, "");

What happens on a request is:

  • The handler calls do_some_io_operation, which registers a callback and fires off whatever the operation is. It–and the handler–return immediately, and processing can move on to anything that’s waiting in the event queue.
  • When the io operation completes, the first inner callback gets executed. That calls do_some_more_processing, which registers yet another callback, starts whatever it starts, and returns.
  • When THAT finishes, the inner callback finally takes all the data it now has available and finishes responding to the request.

Database access might look something like this (not real code, but it might be close, modulo error handling)

http.createServer(function (req, res) {
  dbase.connect('tcp://whateverdbase@localhost/whatever', function(connection) {
    var query = connection.query('SELECT * FROM some_table');
    res.writeHead(200, {'Content-Type': 'text/plain'});
    query.on('row', function(row) {
      res.write('<p>' + row + '</p>');
    query.on('end', function(row) {
      res.end('Hello World\n');
}).listen(1337, "");

Connecting to the dbase is an IO operation, and since everything depends on that almost the whole response function is inside a callback. Setting up the query as I have it here doesn’t necessarily do anything immediately, so it does NOT need to take a callback, but that would be an alternative API. However, you do have to wait for the results of the query, so it gets callbacks for its “row” and “end” events.

I rather like this functional quasi-continuation-passing style of programming, mostly because I like finding out the abstruse theoretical concepts turn out to be useful for real. It might get a bit messy in real examples, though. At the very least I wonder if it would call for a different indentation style .

[And I have a “solution” for that. Or at least something that’s kept me happily occupied for the last couple of days. More on it later, if I ever get around to it.]

The fact that this is all in javascript has a few real advantages. Javascript is halfway to being a functional language, and is thus well suited to this style of programming. But it’s not Lisp or Haskell, so existing programmers don’t have to rewire their brains to use it. (I love Haskell, but I can’t imagine trying to find and manage a team to write a Real Product with it.) Indeed, any web programmer will already be fluent in javascript, and used to working with callbacks, if not quite to the pervasive level that node.js requires.

Using the same language on the client and server is a nice benefit, too. It makes it easy to share code between the two sides, something that can be useful (caveat: writing javascript that will work properly in a browser and inside node.js is NOT completely trivial, but it’s usually not that difficult either). And it’s nice for us programmers to avoid the annoying context switches between languages. Going from javascript to python, for example, I am forever forgetting to put quotes around dictionary keys.

And compared to other dynamic languages, javascript on google’s V8 engine, which node uses, is really fast. For “pure” stuff, just function calls and for loops and the like, it appears to be more in the C/C++ range than the Python/Ruby/php range.

Now for the downsides. Actually, for all I know there aren’t any prohibitive ones! But node.js is still relatively new, and I although I think the core is relatively stable the general ecosystem isn’t. There are lots and lots of modules for doing various things, but in this sort of open-source world it’s really difficult to know what you can trust. In the Perl world, for example, I’ve seen CPAN hailed as the greatest thing since sliced bread, but I’ve seen a lot of crap there, and sorting through it can be a real cost.

node.js itself is quite low-level, lower-level even than php. Someone needs to encapsulate even simple things like (e.g.) gathering get and especially post data into dictionaries of query values and data (not that that’s hard, but it does need to happen). There are some embryonic higher-level frameworks—Express looks very promising, and at least does that post-data processing—I am fairly certain there’s nothing anywhere near as mature and trustworthy as Ruby on Rails or Django (or, if you like that sort of thing).

And conceptually not everyone agrees that eventing is the way to handle concurrency. There are lots of partisans of Erlang and the aforementioned Haskell and even of traditional threading who beg to differ. I’m a bit out of my depth here, so can’t comment usefully.


WebGL performance

June 16, 2011

It turns out the performance problem I mentioned in Chrome is entirely down to Float32Array. Known problem, apparently. In particular it looks to me like garbage collection, as it only shows up every few dozen frames (few hundred in less geometry-intensive cases).


June 14, 2011

I’ve been learning about WebGL recently, not that I have any particular reason to use it. My first experiments, selected mostly because they look neat, are here:

  • A Julia setA Hopf fibration viewer. The Hopf fibration is interesting and important mathematically—it’s part of the reason homotopy theory turns out to be so much more complicated than homology, for example—but its real importance to me is that it makes for great pretty pictures.
  • AMandelbrot set “explorer.” That’s the first thing everyone does when they learn about shaders, right?
  • A Julia set explorer. It turned out far trippier than I had hoped for.

I’m comfortably certain these are not models of good WebGL practice. (Or good html/javascript practice, for that matter).

So what have I learned? For starters, half the people I’ve tried to show these things to can’t run them, for whatever old-browser and old-graphics card/driver reasons. How long will it be before you can reasonably assume any random user is likely to be able to use this stuff? I suppose it does make sense to learn it now so as to be ready in five years when it’s generally supported. Or maybe it’s that the primary audience is gamers, happy to have an excuse to buy a new graphics monstrocard every few months.

As for the thing itself, it’s interesting comparing it to what little I remember from the days when I knew OpenGL. From a high level it’s pretty much what you’d expect from a translation of OpenGL, or a stripped-down version of it, into javascript, the most obvious difference being that you have to provide your own shaders. I’m rather glad to have to reason to learn about shaders, really; they’re new since my day.

I haven’t figured out all the nuances of GLSL. As an example of the sort of thing I’ve run into, the Mandelbrot fragment shader has a big loop to count iterations. for-loops in GLSL must be of a form like

    for(int i=0; i<CONST; i++) {

where CONST is some actual constant—I assume, possibly wrongly, that that’s so loops can be implemented by unrolling. That I learned quickly enough. Where I ran into problems was figuring out what the CONST could be. Some machines, at least older ones, seem to have a cutoff of 255, and behave oddly (it doesn’t look like a simple mod, but I haven’t tried to figure it out) if the bound exceeds that. The GLSL spec (which appears to be somewhat out of sync with what WebGL as implemented uses; am I looking in the wrong place, or otherwise missing something?) wasn’t much help there.

Back to WebGL vs. OpenGL in general, the other immediate difference is that there’s no more glBegin/glEnd: you have to do everything with buffers. That seems to add to the boilerplate. And of course a lot of the familiar OpenGL and glu methods for things like matrix handling are missing, so you have to provide them yourself. Or find a library that does them all. I don’t think I particularly like either of the ones I’ve seen, but haven’t really thought about them much yet. I can see performance being an issue with getting libraries right.

And finally it’s a bit annoying that WebGL only knows from floats, not doubles. That rather surprises me, but I don’t know enough about this stuff to rant without making a fool of myself.

As long as I can stick to my machine, a relatively beefy Macbook Pro, I’m impressed by how well this stuff works, despite the whingeing above. I haven’t done any real stress tests, but what I have done seems to work well and quickly. As expected, both chrome and Firefox support it. Interestingly, the Firefox implementation seems to be noticably more performant than Chrome. During animations Chrome seems to seize up (garbage collecting?) every couple of seconds. Firefox is nice and smooth.

Rambling Thoughts about Comonads

December 18, 2010

[Slightly revised since first posted.]

This entire post is, or is intended to be, a Literate Haskell file. You can copy-paste the whole thing into an .lhs file and run it with ghc (I vouch for it only in version 6.12.3). Some caveats: I am not a Haskell programmer. At worst you should suspect everything I say of being, well, wrong, and at best I’m comfortably certain the code in here is not as elegant as it ought to be. Apologies for all the references that I neglected to include either out of ignorance or out of laziness. And as will be clear I’ve been awfully sloppy throughout.

A while back I started thinking about comonads. I now have little idea why—“a while” is nearly two years—but I think I must have been troubled by the apparent lack of symmetry between monads and comonads in functional programming. It seemed somehow ufair that monads are so useful and get so much attention, while their poor duals are neglected. Really I just wondered whether some of the standard monad constructions and connections—monad notation, most obviously, and the connection with Applicative Arrows—had any dual consructions, and whether they might be useful. It turns out they there are indeed dual constructions, although I suppose I can’t truly swear to the usefulness part. Herein are most of my collected thoughts on the subject.

BTW, I have little idea how much of what follows is original, but a couple of things way down below the fold might be. You can easily find a fair bit about comonads and examples thereof, but I haven’t seen either real proposals for comonad notation (not that I’m claiming there’s one of those here either) or anything about “Coapplicative Arrows” elsewhere.


Django Testing

November 27, 2008

Like many a geek, I’m a lazy bastard, so it’s only recently I’ve gotten around to writing unit tests for my tiny website (I at least have the excuse of doing this only for jollies; I’m no professional web developer). I had vaguely assumed that writing (and tests) would be more trouble than it’s worth, and that it would be difficult to test Really Important Stuff anyway. I had also vaguely assumed that those vague assumptions were almost certainly wrong, and I was a contemptible fool for not having written the tests up front.

It turns out, no surprise, that the latter vague assumption was correct. Python’s doctest and unittest frameworks are already relatively simple, and the django testing framework makes them simpler still (there’s a bit of annoying boilerplate to figure out in python’s raw unittest, which the django framework thoughtfully hides). The setup for test databases is especially nice.

[Mind you, in my first attempt I did somehow manage to blow away my local copy of my database—not just the test database, but the real one. I never did figure that one out…]

Embarrassingly, one of the first tests I wrote turned up a bug. Not to be wondered at, I suppose.

One note, on the very slim chance someone finds it useful. The django test framework looks for tests in each app’s model and test modules. At first I was annoyed by this: I like to put doctests in other places, as they really belong in the docstrings of the functions and classes they test. But—and you already know this if you’re a better python programmer than I—there’s a trivial way around this. The unittest and doctest modules already coexist awfully well. doctest.DocTestSuite creates a TestCase from a module’s doctests, and this can be amalgamated with unittests by defining a function called suite in the test module, which django will look for. That is, put something like this in or tests/

def suite():
    # An easy way of finding all the unittests in this module
    suite = unittest.TestLoader().loadTestsFromName(__name__)
    for mod in myapp.views, myapp.forms:
    return suite

In which I finally learn what’s in a TrueType font

November 1, 2008

LMNOP Last week someone posted an interestingly bizarre problem in the LilyPond newsgroup: using Times New Roman on Vista, the letter N becomes “Ị.” Go figure. Debugging that seemed like a fun puzzle, so I looked into it a bit, and concluded that there was a bug in the font. Someone who knows more than I do diagnosed it more completely: it turns out that the ‘post’ table assigns the name ‘N’ to three different characters, confusing LilyPond (or pango, or freetype, or whatever). Microsoft already knows that, but have no plans to do anything, presumably because Microsoft software doesn’t use the post table, and Microsoft doesn’t care about any stinkin’ software other than their own.

For reasons that escape me, this was enough to inspire me to learn what’s inside a TrueType font. The format is, not surprisingly, both simple and and Byzantine. I’ve cobbled together a python program to fix the problem with MS’s TNR. In case anyone is curious, it’s below the fold. For heaven’s sake don’t assume it won’t ruin anything you run through it.


A Javascript bug in NBC’s Olympics Website

August 17, 2008

Hey, I found a bug! The schedules in NBC’s Olympics website are supposed to be displayable in either Beijing time or your local time. This works only in IE7—so Firefox-using me ran into it the night trying to find out when Michael Phelps would win his last medal.

Here’s the problem:

    var mts = document.getElementsByClassName ( 'timeConvertible' );
    mts.each(function(mt) {
        if(mt.readAttribute( 'title' ) != null && mt.readAttribute( 'title' ).length > 0)
            // etc

The error is in line 6: “mts.each is not a function.”

What’s happening here? The website uses the prototype js library, which provides many nice features (although I’ve decided I prefer jQuery; I’ve been meaning to write about that for a while now). The js developer here didn’t read the prototype documentation cautioning against getElementsByClassName. In Firefox (and Opera), that is a native function, but in IE7 it’s not, so prototype defines it. And prototype defines a more useful version, returning a prototype Array object rather than a native unmunged array. That prototype Array has the “each” function; the native one doesn’t. Firefox and Opera’s superior js implementation leads to a worse result.

Debugging JavaScript, now in Opera

June 5, 2008

JavaScript is a neat little language, but (like countless others, apparently) I find it a real pain to debug. Part of my problem is me. I’m relatively new to JavaScript, which isn’t yet as embedded in my brain as, say, C++ is. I still suck as a JavaScript developer. But I refuse to take all the blame: part of the problem is the tools. Firebug and Firefox’s JavaScript Debugger (a.k.a Venkman) are both useful, but both are buggy and annoying—for some reason I have a terrible time with both getting breakpoints to work reliably. And the less said about the Microsoft Script Debugger the better.

So last week I downloaded a beta of Opera 9.5 which includes an alpha of Dragonfly, Opera’s new suite developer tools (I have no idea why they called it an “alpha”; possibly they wanted to sow confusion to frighten away the rabble). And so far I’m pleased. It does have a few bugs—resizing the source window doesn’t immediately redisplay correctly, expanding/collapsing/expanding objects in the frame inspection window doesn’t work—but nothing major. It’s also missing some fairly basic features, or has hidden them fairly effectively—there’s no watch window, and no way to display all breakpoints(!). But overall I’ve found it very nice.

It’s also got me using Opera more generally. I still prefer Firefox, for reasons that I’ll try to enumerate at some point, but Opera certainly has its merits.

A little Django flatpage trick

May 31, 2008

For each general area of my humble little website, I have a base template that takes care of a links bar and breadcrumbs and things. All the individual pages in that area extend that base.html, which in turn extends parent base templates. Nothing unusual there, and all well and good.

Except for flatpages. Flatpages are great, but out of the box they’re a tiny bit rigid. I want a flatpage in a particular area to extend the right base.html, but that’s not quite what the flatpage template setting provides—that’s a whole template, and I just want a base to extend. One could of course have a different flatpage template for each base.html, but that’s gross, and not very DRY.

Maybe there is some simple obvious way to do this built in to Django, but I couldn’t find it. My solution is to extract the appropriate base template from the flatpage itself. I put it in a django comment, which will get stripped out of the content before rendering. So the line

{# Base utilities/base.html #}

goes in the body of the flatpage, and gets extracted with a filter. [It might be better to infer it from the page’s url, if one’s directory structure and url structure always match.] The only problem is that I need the filter loaded before the extends tag, and the extends tag needs to come before anything else. It used to be possible to load before extending, but that was an evil (if useful) loophole, now closed.

Like all problems in computer science, this can be solved with another level of indirection. flatpages/default.html loads the filter and extracts the base template name, and then includes another template to do the actual rendering.

Here’s the code, simple and completely non-robust though it is. In a tempatetags/ or whatever you want to call it:

def stripdjangocomments(text):
    Strip django comments from the text.
    s = re.sub(r'{#.*?#}', '', text)
    return s

def getbase(text, default = "base.html"):
    Look for a string of the form {# Base foo #} and return foo
    m ='{#\s*Base\s*(\S*?)\s*#}', text)
    if m and m.groups()[0]:
        return m.groups()[0]
        return default

In templates/flatpages/default.html

{% load flatpage_utils %}

{% with flatpage.content|getbase as pagebase %}
{% include "flatpages/flatpagebody.html" %}
{% endwith %}

And in templates/flatpages/flatpagebody.html

{% extends pagebase %}
{% load whatever_else %}

{% block title %}
{{ flatpage.title }}
{% endblock %}

{# maybe other stuff #}

{% block content %}
{# add more filters if you like #}
{{ flatpage.content|stripdjangocomments }}
{% endblock %}

And that’s it.

Unicode, Browsers, Python, and Kvetching

May 28, 2008

My HTML/unicode character utility is now in a reasonably usable state. I ended up devoting rather more effort to it than I had originally planned, especially given that there are other perfectly useful such things out there. But once you start tweaking, it’s hard to stop. There are now many wonderful subtleties there that no one but me will ever notice.

What gave me the most grief was handling characters outside the Basic Multilingual Plane, i.e. those with codes above 0xFFFF. That’s hardly surprising. And I suppose it shouldn’t be surprising that browsers handle them so inconsistently. All the four major browsers try to display BMP characters using whatever fonts are installed, but not so for the higher ones. In detail:

  • Firefox makes a valiant effort to display them, using whatever installed fonts it can find. It’s fairly inconsistent about which ones it uses, though.
  • IE7 and Opera make no effort to find fonts with the appropriate characters. They do work if you specify an appropriate font.
  • Safari (on Windows) doesn’t display them even if you specify a font. This does not further endear Safari to me.

Oh, and on a couple of XP machines I had to reinstall Cambria Math (really useful for, you know, math) to get the browsers to find it. There must be something odd about how the Office 2007 compatibility pack installed its fonts the first time (I assume that’s how they got there).

On the server side, I knew I would have to do some surrogate-pair processing myself, and that didn’t bother me. Finding character names and the like was more annoying. I was delighted with python’s unicodedata library until I started trying to get the supplementary planes to work. The library restricts itself to the BMP, presumably because python unicode strings have 16-bit characters. The reason for the restriction is somewhat obscure to me—the library’s functions could presumably work either with single characters or surrogate pairs; and I’m pretty sure all the data is actually there (the \N{} for string literals works for supplementary-plane characters, for example).

The whole unicode range ought to work in wide builds of python, but I have no idea if that would work with Django and apache/mod_python and Webfaction, and I’m far too lazy to try. So I processed the raw unicode data into my own half-assed extended unicode library, basically just a ginormous dict with a couple of functions to extract what I want (so far just names, categories and things to come if I ever get around to it).