Posts Tagged ‘Programming’

node.js

November 13, 2011

I’ve been looking at node.js, the server-side (more general than that, really) javascript execution environment. My gut reaction is that I like it–I like it a lot. But I think it will take some serious investigation to determine whether it’s ready for an industrial-strength application.

Now some details, aimed at node-n00bs such as myself. Anyone who would like to point out how badly I’ve gotten things wrong, feel free!

First, it’s important to know that node.js is architected to solve a specific problem, to wit, scalability in the presence of lots of concurrent access. The usual way something like apache/php handles concurrency is to spawn a thread for each server request. But threads have overhead, and there’s only so far you can push that before you have to buy more servers.

node.js has pretty much the opposite philosophy. The buzzwords you see are “asynchronous” and “event-driven” or just “evented”—its central element is a single-threaded event loop. But that doesn’t tell you much about why it’s a good idea. I found a much more revealing tagline here: “everything runs in parallel, except your code.”

The idea is that in a typical (non-trivial) server request most of the processing time is taken up in things like database or filesystem access, henceforth referred to generically (and not always completely correctly) as “IO.” Those are things that either don’t take a lot of CPU cycles, or at least are already in their own threads or processes. If the thread running those IO operations waits for them to complete, it will be sitting idle; in a single-threaded event loop that means it will block anything waiting for it. So in a single-threaded event loop, you don’t wait! IO operations in node.js don’t return their data directly; instead, they accept callbacks to process the results when they’re ready. Those callbacks are themselves processed in the event loop.

The callback functions themselves should all be lightweight, so that tens of thousands of them can be executed per second. They palm off all the hard work to “IO functions,” black boxes that may in turn add more callbacks. The node.js API, and good node.js plugin modules, are structured to make it positively difficult to do anything that blocks.

Here’s a “hello world” webserver, straight from the node.js front page, that responds to any request with, er, “hello world”:

var http = require('http');
http.createServer(function (req, res) {
  res.writeHead(200, {'Content-Type': 'text/plain'});
  res.end('Hello World\n');
}).listen(1337, "127.0.0.1");

The guts of that there–the argument to http.createServer–is the callback that gets called from the event loop whenever the server fires the “request” event (you don’t see the event loop yourself; you just add callbacks for events, and they are called from the event loop). It does the actual responding by calling methods on res, a ServerResponse object.

Of course a real response will be more complicated (starting with url-parsing, which I’ll ignore completely). Traditionally that might look something like

http.createServer(function (req, res) {
  var data = do_some_io_operation(req);
  var output = do_some_more_processing(data);
  res.writeHead(200, {'Content-Type': 'text/html'});
  res.end(output);
}).listen(1337, "127.0.0.1");

But those two function calls, if they do anything IO-ish or otherwise nontrivial, are not kosher node.js. This should be something like this:

http.createServer(function (req, res) {
  do_some_io_operation(req, function(data) {
    do_some_more_processing(data, function(output) {
      res.writeHead(200, {'Content-Type': 'text/html'});
      res.end(output);
    });
  });
}).listen(1337, "127.0.0.1");

What happens on a request is:

  • The handler calls do_some_io_operation, which registers a callback and fires off whatever the operation is. It–and the handler–return immediately, and processing can move on to anything that’s waiting in the event queue.
  • When the io operation completes, the first inner callback gets executed. That calls do_some_more_processing, which registers yet another callback, starts whatever it starts, and returns.
  • When THAT finishes, the inner callback finally takes all the data it now has available and finishes responding to the request.

Database access might look something like this (not real code, but it might be close, modulo error handling)

http.createServer(function (req, res) {
  dbase.connect('tcp://whateverdbase@localhost/whatever', function(connection) {
    var query = connection.query('SELECT * FROM some_table');
    res.writeHead(200, {'Content-Type': 'text/plain'});
    query.on('row', function(row) {
      res.write('<p>' + row + '</p>');
    });
    query.on('end', function(row) {
      res.end('Hello World\n');
    });
  });
}).listen(1337, "127.0.0.1");

Connecting to the dbase is an IO operation, and since everything depends on that almost the whole response function is inside a callback. Setting up the query as I have it here doesn’t necessarily do anything immediately, so it does NOT need to take a callback, but that would be an alternative API. However, you do have to wait for the results of the query, so it gets callbacks for its “row” and “end” events.

I rather like this functional quasi-continuation-passing style of programming, mostly because I like finding out the abstruse theoretical concepts turn out to be useful for real. It might get a bit messy in real examples, though. At the very least I wonder if it would call for a different indentation style .

[And I have a “solution” for that. Or at least something that’s kept me happily occupied for the last couple of days. More on it later, if I ever get around to it.]

The fact that this is all in javascript has a few real advantages. Javascript is halfway to being a functional language, and is thus well suited to this style of programming. But it’s not Lisp or Haskell, so existing programmers don’t have to rewire their brains to use it. (I love Haskell, but I can’t imagine trying to find and manage a team to write a Real Product with it.) Indeed, any web programmer will already be fluent in javascript, and used to working with callbacks, if not quite to the pervasive level that node.js requires.

Using the same language on the client and server is a nice benefit, too. It makes it easy to share code between the two sides, something that can be useful (caveat: writing javascript that will work properly in a browser and inside node.js is NOT completely trivial, but it’s usually not that difficult either). And it’s nice for us programmers to avoid the annoying context switches between languages. Going from javascript to python, for example, I am forever forgetting to put quotes around dictionary keys.

And compared to other dynamic languages, javascript on google’s V8 engine, which node uses, is really fast. For “pure” stuff, just function calls and for loops and the like, it appears to be more in the C/C++ range than the Python/Ruby/php range.

Now for the downsides. Actually, for all I know there aren’t any prohibitive ones! But node.js is still relatively new, and I although I think the core is relatively stable the general ecosystem isn’t. There are lots and lots of modules for doing various things, but in this sort of open-source world it’s really difficult to know what you can trust. In the Perl world, for example, I’ve seen CPAN hailed as the greatest thing since sliced bread, but I’ve seen a lot of crap there, and sorting through it can be a real cost.

node.js itself is quite low-level, lower-level even than php. Someone needs to encapsulate even simple things like (e.g.) gathering get and especially post data into dictionaries of query values and data (not that that’s hard, but it does need to happen). There are some embryonic higher-level frameworks—Express looks very promising, and at least does that post-data processing—I am fairly certain there’s nothing anywhere near as mature and trustworthy as Ruby on Rails or Django (or asp.net, if you like that sort of thing).

And conceptually not everyone agrees that eventing is the way to handle concurrency. There are lots of partisans of Erlang and the aforementioned Haskell and even of traditional threading who beg to differ. I’m a bit out of my depth here, so can’t comment usefully.

Advertisements

WebGL performance

June 16, 2011

It turns out the performance problem I mentioned in Chrome is entirely down to Float32Array. Known problem, apparently. In particular it looks to me like garbage collection, as it only shows up every few dozen frames (few hundred in less geometry-intensive cases).

Rambling Thoughts about Comonads

December 18, 2010

[Slightly revised since first posted.]

This entire post is, or is intended to be, a Literate Haskell file. You can copy-paste the whole thing into an .lhs file and run it with ghc (I vouch for it only in version 6.12.3). Some caveats: I am not a Haskell programmer. At worst you should suspect everything I say of being, well, wrong, and at best I’m comfortably certain the code in here is not as elegant as it ought to be. Apologies for all the references that I neglected to include either out of ignorance or out of laziness. And as will be clear I’ve been awfully sloppy throughout.

A while back I started thinking about comonads. I now have little idea why—“a while” is nearly two years—but I think I must have been troubled by the apparent lack of symmetry between monads and comonads in functional programming. It seemed somehow ufair that monads are so useful and get so much attention, while their poor duals are neglected. Really I just wondered whether some of the standard monad constructions and connections—monad notation, most obviously, and the connection with Applicative Arrows—had any dual consructions, and whether they might be useful. It turns out they there are indeed dual constructions, although I suppose I can’t truly swear to the usefulness part. Herein are most of my collected thoughts on the subject.

BTW, I have little idea how much of what follows is original, but a couple of things way down below the fold might be. You can easily find a fair bit about comonads and examples thereof, but I haven’t seen either real proposals for comonad notation (not that I’m claiming there’s one of those here either) or anything about “Coapplicative Arrows” elsewhere.

(more…)

Some AJAX in Django

May 11, 2008

Months ago I started looking into doing AJAXy things within Django, and (typically for me) never actually did any of them. Finally I’ve started looking at that again. My needs are simple and dull: I just wanted quick and seamless responses to changes in form data in the little utilities I just added to my website.

Now I know very little about Ruby on Rails, but some of what I’ve seen if does look kinda cool. In particular I liked the respond_to gadget, which switches on requested mimetypes to figure out what response to send from a view (or action, or whatever they’re called on Rails). That seems to allow nice code factoring with minimal syntax, in a way that’s concise and clever (typical for Ruby) and clear (not so typical, IMO…).

I’m not convinced this is a truly great idea, for reasons I’ll detail below, but what the hey, when did that ever stop anyone? So I hacked up a python/Django analogue (see the end of the post). I may have course have misunderstood completely what’s up with the Ruby thing, in which case, oh well.

Here’s an example of how you use this thing—a Responder object—in a view:

def index(request):
    data = { 'foo' : 'bar', 'this' : 'that' }
    responder = Responder(request, 'template', data)
            { 'raw' : raw, 'types' : types })

    responder.html

    responder.js

    return responder.response()

This says, more or less “If the request wants HTML, render the data with the template template.html. If it wants javascript, render with the template template.js and the javascript mimetype.” That is, it’s something like

def index(request):
    data = { 'foo' : 'bar', 'this' : 'that' }

    if <wants html>:
        return render_to_response('template.html', data)

    if <wants js>:
        return render_to_response('template.js', data,
            mimetype='text/javascript' )

    return responder.response()

[where that <wants html/javascript> conceals some complexity…]

The render-a-template behavior can be overridden: those hacky ‘html’ and ‘js’ attributes are callable. If one of them is passed a function, it calls it: if the function returns something, that something is used as the response. It can also modify data and return None to proceed with default handling. Here’s an example I used when testing this stuff on different browsers. It prints the contents of the HTTP_ACCEPT header, and provides a button to fire an ajax request to replace that. In this case I built the javascript messily by hand.

def index(request):
    raw = request.META['HTTP_ACCEPT']
    types = parseAccept(request)

    responder = Responder(request, 'index.html',
            { 'raw' : raw, 'types' : types })

    responder.html

    @responder.js
    def jsresp(*args, **kwargs):
        text = raw + '<br><br>' + \
            '<br>'.join('%s %g' %(s, q) for s,q in types)
        js = "$('content').update('%s');" % text
        return HttpResponse(js, mimetype='text/javascript')

    return responder.response()

Here’s the corresponding template (which uses prototype):

<script src="/static/js/scriptaculous-js-1.8.1/lib/prototype.js" type="text/javascript"></script>
<script type="text/javascript">
    function ajaxUpdate () {
        headers = { Accept : 'text/javascript;q=1, */*;q=0.1' };
        if(Prototype.Browser.Opera)
        {
            headers.Accept += ',opera/hack'
        }

        new Ajax.Request('/',
            { method:'get', parameters: {}, requestHeaders : headers } );
    };
</script>

<div id="content">
    {{ raw }}<br><br>
    {% for s in types %}
    {{ s.0 }} {{ s.1 }}<br>
    {% endfor %}
</div>
<div>
    <br>
    <input id="b1" onclick="ajaxUpdate();" type="button" value="Click Me!">
    </input>
</div>

So What Have I Learned From This? Well, it all seems to work, so I’ll keep using it. But I’m not totally sold that this—switching on HTTP_ACCEPT, and my own particular implementation—is the Right Way to do things.

Philosophically, the general idea seems awfully prone to abuse. As I understand RESTful web services (i.e. not very well), different requests correspond to different representations of the same underlying data. But are the original html and the javascript that updates it really different representations of the same thing, or different animals altogether? I think that’s a murky point, at best. And I should think that in real life situations it could get messy. What happens, for example, if there is more than one sort of javascript request (e.g. if there are different forms on a page that do fundamentally different things)?

Rails and REST fans, please set me straight here!

Practically, the HTTP_ACCEPT thing seems delicate. I had to futz around a bit to get it to work in a way I felt at all confident of. Browsers seem to have different opinions about what they should ask for. Oddly, the browser that caused me the most problems was Opera—despite what I told prototype’s AJAX request, Opera insisted on concatenating the ACCEPTed mimetypes with the original request’s mimetypes. I hacked around that by throwing in a fake mimetype to separate the requests I wanted from those Opera wants; see the template above and the code below.

So anyway, maybe it would be better, or at least more Django, to be explicit about these AJAX requests, and either give them different URLs (and factor common code out of the various views) or add a piece of get/post data, as here. For now I’ll keep doing what I’m doing, and see if I run into problems.

Here’s the Responder code. It has numerous shortcomings, so use at your own risk. It is completely non-bulletproof (and non-debugged), and won’t work if you don’t use it just like I wanted to use it (e.g. you’d better give it a template name). It obviously needs more mimetype knowledge—it falls back on python’s mimetype library, but that seems seriously unacceptable here. And I’m very lame about how I parse the HTTP_ACCEPT strings.

import sys
import re
import os, os.path, mimetypes
import django
from django.http import HttpResponse
from django.shortcuts import render_to_response
from django.template import RequestContext

class _ResponseHelper(object):
    def __init__(self, ext, mimetypes, responder):
        self.responder = responder
        self.ext = ext
        self.mimetypes = mimetypes
        self.fn = None

    def __call__(self, fn=None):
        self.fn = fn
        return fn

class Responder(object):
    """
    Utility for 'RESTful' responses based on requested mimitypes,
    in the request's HTTP_ACCEPT field, a la Reils' respond_to.

    To use, create a responder object.  Pass it the request object
    and the same arguments you would pass to render_to_response.
    Omit the file extension from the template name---it will be added
    automatically.
    For each type to be responded to, reference an attribute of the
    appropriate name (html, js, etc).
    Call the respond function to create a response.
    The response will be created by appending the extension to the filename
    and rendering to response, with the appropriate mimetype.

    To override the default behavior for a given type, treat its
    attribute as a function, and pass a function to it.
    It will be called with the same arguments as the Responder's constructor.
    If the function can modify the passed data, and either return None
    (in which case the template handling proceeds), or return a response.
    Function decorater syntax is a convenient way to do this.

    Example:

        responder = Responder(request, 'mytemplate', { 'foo': 'bar' })

        responder.html

        @responder.json
        def jsonresp(request, templ, data):
            data['foo' : 'baz']

        @responder.js
        def jsresp(request, templ, data):
            return HttpResponse(someJavascript,
                mimetype='application/javascript')

        return responder.response()

    Here an html request is processed as usual.
    A JSON request is processed with changed data.
    A JS request has its own response.

    """
    types = { 'html' : ('text/html',),
              'js' : ('text/javascript',
                      'application/javascript',
                      'application/x-javascript'),
              'json' : ('application/json',),
            }

    def __init__(self, request, *args, **kwargs):
        self.request = request
        self.resp = None
        self.args = [a for a in args]
        self.kwargs = kwargs
        self.priorities = {}
        for t, q in parseAccept(request):
            self.priorities.setdefault(t, q)
        self.defq = self.priorities.get('*/*', 0.0)
        self.bestq = 0.0

    def maybeadd(self, resp):
        try:
            thisq = self.bestq
            for mt in resp.mimetypes:
                q = self.priorities.get(mt, self.defq)
                if q > thisq:
                    resp.mimetype = mt
                    self.resp = resp
                    self.bestq = q
        except:
            pass

    def response(self):
        if self.resp:
            if self.resp.fn:
                result = self.resp.fn(self.request, *self.args, **self.kwargs)
                if result:
                    return result

            # the template name ought to be the first argument
            templ = self.args[0]
            base, ext = os.path.splitext(templ)
            if not ext:
                templ = "%s.%s" % (base, self.resp.ext)
            self.args[0] = templ
            self.kwargs['mimetype'] = self.resp.mimetype
        # if there wasn't a response, default to here
        response = render_to_response(
                context_instance=RequestContext(self.request),
                *self.args, **self.kwargs)
        return response

    def __getattr__(self, attr):
        mtypes = None
        if attr not in self.types:
            mtypes = [mt for mt, enc in [mimetypes.guess_type('.'+attr)]
                        if mt]
        else:
            mtypes = self.types[attr]
        if mtypes:
            resp = _ResponseHelper(attr, mtypes, self)
            self.maybeadd(resp)
            return resp
        else:
            return None

def parseAccept(request):
    """
    Turn a request's HTTP_ACCEPT string into a list
    of mimetype/priority pairs.
    Includes a hack to work around an Opera weirdness.
    """
    strings = request.META['HTTP_ACCEPT'].split(',')
    r = re.compile(r'(.*?)(?:\s*;\s*q\s*\=\s*(.*))?$')
    types = []
    for s in strings:
        m = r.match(s)
        q = float(m.groups()[1]) if m.groups()[1] else 1.0
        t = m.groups()[0].strip()
        if t == 'opera/hack':
            break
        types.append((t, q))
    return types

Some HTML text utilities

May 8, 2008

I’ve just added some utilities to my website:

  • A converter that takes HTML entities to/from the characters they represent. The input can be actual characters (e.g þ), named entity references (&thorn;), or numeric references, decimal or hexadecimal. It also accepts some abbreviations (two back-ticks for “, for example), which I’ll eventually document. Maybe.
  • Lorem Ipsum text, with settable font, font size, and line height.
  • A list of named HTML entities. (Yes, that’s easy to find, but I wanted a place I could get to easily.)

[Yes, I know all these things are easily available many places. I wanted to be able to get to them without having to think about it, and to be able to fiddle with the details.]

I wrote these for a target user base of one—me—so there’s no particular reason to think that they’ll be useful for anyone else. They’re also in a bit of a raw and unfinished state (in which they’ll stay until I get around to doing something about it). But hey, use them if you like.

All the entity names and unicode descriptions come from the python unicodedata and htmlentitydefs libraries. I love the way python includes stuff like that.

Django QuerysetRefactor

April 27, 2008

In major Django news, Malcolm Tredinnick‘s long-awaited QuerysetRefactor branch is in for real; huzzah!  This has little immediate impact on my tiny site.   It did allow (and require, as I expected) me to remove the QLeftOuterJoin workaround from Django Snippets I used in a couple of places.  It also fixes other problems I’ve run into before—with ordering across relations, for example—and looks to be a major nicification in general.    I’m very impressed that so major an internals change could be done with so few backwards incompatibilities.

Ruby-like expression substitution in Python

February 28, 2008

I don’t know much Ruby, and probably won’t learn; all that syntax and magic scare me away. But I have to admit it has some darned useful gadgets. Here’s a python function I hacked up to do something much like Ruby’s expression-substitution, using the same #{ } syntax. It doesn’t allow curly braces inside the #{ }; were I a little less lazy I would put in some escaping.


import re
import sys

def esub(s):
    """
    Perform Ruby-like expression substitution.

    >>> x=3
    >>> y='A'
    >>> esub('abc#{x}def#{3+5}hij#{"".join([y, y])}')
    'abc3def8hijAA'
    """
    restr = r'(?:#{(?P[^{}]*)})|(?:[^#])+|#'
    fr = sys._getframe(1)
    def process(m):
        txt = m.group('exp')
        if txt is not None:
            val = eval(txt, fr.f_globals, fr.f_locals)
            return type(s)(val)
        else:
            return m.group()
    return ''.join(process(m) for m in re.finditer(restr, s))

Authentication and Browser Caching in Django, part II

January 12, 2008

The other day I wrote about turning off browser caching when a user is logged in. Since I’m apparently a clueless n00b, it only occurred to me later that this is the sort of thing belongs in middleware. That way you don’t have to modify individual views, and it works for flatpages as well. Here’s the middleware; it should go in MIDDLEWARE_CLASSES before sessions and flatpages:

import re

def _add_to_header(response, key, value):
    if response.has_header(key):
        values = re.split(r'\s*,\s*', response[key])
        if not value in values:
            response[key] = ', '.join(values + [value])
    else:
        response[key] = value

def _nocache_if_auth(request, response):
    if request.user.is_authenticated():
        _add_to_header(response, 'Cache-Control', 'no-store')
        _add_to_header(response, 'Cache-Control', 'no-cache')
        _add_to_header(response, 'Pragma', 'no-cache')
    return response

class NoCacheIfAuthenticatedMiddleware(object):
    def process_response(self, request, response):
        try:
            return _nocache_if_auth(request, response)
        except:
            return response

Oh, and an annoying note: it’s still possible for firefox to keep an authenticated page cached, I can get that to happen with a sequence of Back and Reloads. Maybe that’s because the Back button is trying to respect history rather than the cache? Oh well, I told you not to mistake this for a security fix.

Authentication and browser caching in Django

January 10, 2008

While adding bits of authentication niceness to my website, I noticed a bit of ugliness. If I logged in, looked at a page that took account of the login, logged out, and hit Back in the browser, I still saw the logged-in page. That’s because the browser cached it, and just redisplayed on Back. I don’t care so much about caching non-authenticated views, but it just seems wrong to cache authenticated ones, so I have (I hope!) disabled it.

What disabling caching requires (here‘s a tutorial) is adding the directives Cache-Control:no-cache and Pragma:no-cache (different browsers may pay attention to one or the other!) to the html header, which is easily done. I have two methods for this; one is a replacement for render_to_response (which I was already using, to simplify using RequestContext), and the other is a wrapper for existing view functions, which can be used as a decorator. Code below.

Two important notes:

  • This should not be considered any sort of security fix.
  • I am not using the cache middleware; my site is way too small to need it. As far as I can tell this shouldn’t interact badly with the cache middleware, but sure don’t promise that.

Here’s the code:

def _add_to_header(response, key, value):
    if response.has_header(key):
        values = re.split(r'\s*,\s*', response[key])
        if not value in values:
            response[key] = ', '.join(values + [value])
    else:
        response[key] = value

def _nocache_if_auth(request, response):
    if request.user.is_authenticated():
        _add_to_header(response, 'Cache-Control', 'no-store')
        _add_to_header(response, 'Cache-Control', 'no-cache')
        _add_to_header(response, 'Pragma', 'no-store')
    return response

def rtr(request, *args, **kwargs):
    """
    If the request includes an authenticated user, disable browser caching.
    """
    response = render_to_response(
            context_instance=RequestContext(request),
            *args, **kwargs)
    return _nocache_if_auth(request, response)

def nocache_if_authenticated(fn):
    """
    Wrap the given view function so that browser caching is disabled
    with authenticated users.
    """
    def wrapped(request, *args, **kwargs):
        response = fn(request, *args, **kwargs)
        return _nocache_if_auth(request, response)
    return wrapped

UPDATE: D’oh! This should be middleware. I’ll implement that this weekend.

Spellchecking in python

January 7, 2008

FWIW, here’s the script I threw together to extract the wordlist I mentioned in the previous post:

#! /usr/bin/env python2.5

from __future__ import with_statement

import os
import re
import sys

from optparse import OptionParser

def worditer(wordsin, dict = None):
    r = re.compile(r'[#&]')
    if dict:
        cmd = 'aspell -a --lang=%s' % dict
    else:
        cmd = 'aspell -a'
    i, o = os.popen2(cmd)
    # skip first line
    o.readline()
    for w in wordsin:
        if w:
            i.write(w + '\n')
            i.flush()
            result = o.readline()
            if result and result != '\n':
                o.readline()
                if r.match(result):
                    # add the word for this session
                    i.write('@%s\n' % w)
                    yield w

def dowords(wordsin, outstr, dict):
    for w in worditer(wordsin, dict):
        outstr.write(w + '\n')

def filewordsiter(filenames):
    regex = re.compile(r'\W*')
    for fname in filenames:
        with open(fname) as f:
            for line in f:
                for w in regex.split(line):
                    yield w

def dofiles(filenames, outstream, dict):
    dowords(filewordsiter(filenames), outstream, dict)

def main():
    parser = OptionParser()
    parser.add_option('-d', '--dict', dest = 'dict',
                        help = 'Dictionary to use')
    parser.add_option('-o', '--out', dest = 'outfile',
                        help = 'Output file, stdout if none')
    options, filenames = parser.parse_args()
    if options.outfile:
        outstr = open(options.outfile, "w")
    else:
        outstr = sys.stdout
    dofiles(filenames, outstr, options.dict)

if __name__ == '__main__':
    main()