Tidy Allocators Patch

This patch should applies cleanly against the September 2003 version of tidy. It comprises a number of earlier patches that correct memory leaks, several bugs that can cause access violations, and work to add support for per-document allocators.

Any comments or suggestions would be appreciated, send to mark@npsl.co.uk. See more free stuff by mark here.

Download

Initial version:
allocators.zip (24k)

Commentary

Stuff from the mail posted to the tidy-dev list with commentary:

Here's the gist:

typedef struct _TidyAllocatorVtbl {
    void* (*alloc)( TidyAllocator *self, size_t nBytes );
    void* (*realloc)( TidyAllocator *self, void *block, size_t nBytes );
    void (*free)( TidyAllocator *self, void *block);
    void (*panic)( TidyAllocator *self, ctmbstr msg );
} TidyAllocatorVtbl;

typedef struct _TidyAllocator {
    const TidyAllocatorVtbl *vtbl;
} TidyAllocator;

To declare a custom allocator, you do:

typedef struct _MyAllocator {
    TidyAllocator base;
    /* other state */
} MyAllocator;

then:

MyAllocator.base.vtbl = &myVtbl;

(declaring myVtbl somewhere). The allocator always gets back a pointer to itself, which it can cast to the correct type (MyAllocator above) and then access state as appropriate.

New version of existing functions are available:

tidyCreateWithAllocator
tidyBufInitWithAllocator
tidyAllocBufWithAllocator

The allocators for user provided buffers and documents can be different (AFAICT, I haven't tested this assertion in detail).

Other things to note are:

There are an extra couple of fixes from the original cleanup:

The final issue is what to with file handles, and such like. This is not as such something that directly affects the patch; more that the only way to carry going after a malloc failure is to use setjmp/longjmp to get out of it. This potentially leaves open files (and their associated structures), which is a pretty fatal kind of leak (you tend to run out of file handles faster than memory). You can get around this by managing all of the files for tidy (passing them as custom sinks/sources), but it would be nice not to have to. The only clean way I can see around this is to decorate nearly _everything_ that allocates memory, and have it potentially return a failure. Then at the top level, you can return ENOMEM from the API call.

Valid XHTML 1.0!