Foolish Assertions: code

Showing posts with label code. Show all posts

Saturday, 20 March 2010

A Crime Against Nature

Every so often, while writing Python, I've found myself wishing I could easily dispatch method calls according to the type of the arguments. The urge usually passes quickly, but... oh, the hell with it, there's no point trying to justify what I've done. Just look:


>>> import bondage
>>> 
>>> class C(object):
...   @bondage.discipline(int)
...   def foo(self, arg):
...     print 'int'
...   @foo.discipline(str)
...   def foo(self, arg):
...     print 'str'
...   @foo.discipline(int, str, int)
...   def foo(self, arg1, arg2, arg3):
...     print 'int, str, int'
... 
>>> c = C()
>>> c.foo(1)
int
>>> c.foo('a')
str
>>> c.foo(1, 'a', 1)
int, str, int
>>> c.foo([])
Traceback (most recent call last):
  File "", line 1, in <module>
  File "bondage.py", line 18, in <lambda>
    return lambda *args: self._dispatch(obj, *args)
  File "bondage.py", line 22, in _dispatch
    return self._argspecs[argspec](obj, *args)
KeyError: (<type 'list'>,)
>>>

I'd like to make it clear that there is absolutely no excuse for perpetrating this sort of insanity, ever. With that said, here's how I did it:


class discipline(object):
    
    def __init__(self, *argspec):
        self._argspecs = {}
        self.discipline(*argspec)
    
    def discipline(self, *argspec):
        self._argspec = argspec
        return self
    
    def __call__(self, f):
        self._argspecs[self._argspec] = f
        return self

    def __get__(self, obj, objtype=None):
        return lambda *args: self._dispatch(obj, *args)

    def _dispatch(self, obj, *args):
        argspec = tuple(map(type, args))
        return self._argspecs[argspec](obj, *args)

Obviously it's a stupid implementation, and if you wanted to do this properly you'd have to pay attention to subtypes, and do something clever with numeric types, and... oh, God, what am I saying?

Enough!

If you really want to do this "properly", use some other language where it's already built in, and begone.

Tuesday, 2 March 2010

The Joy of Self

I have a distant and fuzzy memory, from back in the day... when I was but a wee slip of a lad, sallying forth to do battle with million-line C++ monstrosities (and just barely escaping with my sanity intact purple monkey dishwasher), I came upon a Path class. It was probably called CPath: classes were new and shiny, so far as any of us knew at the time, and absolutely deserved a prefix to underline their special status. Look, ma, I'm programming Object Orientedly!

And, yeah, it was horrible. Big, clunky, confusing... I'd like to say that the blistering speed made up for it all, but that would be entirely untrue. It was nasty.

As a result of this, I had never since felt the slightest urge to create a Path class of my own... until today. I was trying to get an overgrown build script under control, and - despite my scars - it suddenly seemed like a good idea.

So I had a go.

And... well, this sort of thing is why I love Python, and is also the clearest illustration I've yet seen of why explicit self is a Good Thing. The following code is, as usual, hacked up from memory and may therefore contain hilariously deadly bugs; caveat lector.


import os, shutil

class Path(str):

    def __new__(cls, path):
        abs_ = os.path.abspath(path)
        norm = os.path.normpath(abs_)
        return str.__new__(cls, norm)

    exists = property(os.path.exists)
    isfile = property(os.path.isfile)
    isdir = property(os.path.isdir)
    # etc...

    move = shutil.move
    listdir = os.listdir
    # etc...

    def __getattr__(self, name):
        return self.join(name)

    def join(*args):
        return Path(os.path.join(*args))

    def delete(self):
        if self.isdir:
            shutil.rmtree(self)
        else:
            os.remove(self)
    # etc...

Now, IMO, this was a massive win: I made the client code a lot less verbose, and hence clearer, and I did most of the work by trivially subclassing a builtin type and dropping in a bunch of standard library functions as methods. The real one has many more bells and whistles (in fact, I think I got a bit carried away) but hopefully you get the idea.

The crucial point is that I couldn't have done it so neatly without explicit self; also, amusingly, most of the explicit selfs in this class are in fact implicit. So there.

Thursday, 14 January 2010

Where's a plumber when you need one?

Assertion: On Win32, there's no point bothering with subprocess.PIPE -- just use tempfile.TemporaryFile instead.

Context: You create a Popen(cmd, stdout=PIPE, stderr=PIPE), and you let it run (with a timeout); sometimes it completes successfully, which is cool, and sometimes it doesn't, in which case you read stdout and stderr and try to figure out what went wrong. This is all fine and dandy until one day you add a *little* bit more logging to the tool you're calling, and it suddenly wedges forever.

Explanation: Of course, this is because you've filled up some buffer, which you should have been periodically emptying. However, you can't just select() and read one byte at a time, because select doesn't work with pipes on Win32; you can't just read() what's there, because it blocks and stops the timeout from working; you don't want to spin off another thread to do your reading, because that involves tedious extra code and feels like killing a fly with a sledgehammer; and you don't want to screw around with readline() because that also involves tedious bookkeeping and extra code.

Solution: So, just do the following (warning, coded from memory):


from subprocess import Popen
from tempfile import TemporaryFile
from time import time, sleep

def assert_runs(cmd, timeout=10):
    out = TemporaryFile()
    err = TemporaryFile()
    end_time = time() + timeout
    process = Popen(cmd, stdout=out, stderr=err)
    while process.poll() is None:
        sleep(0.1)
        if time() > end_time:
            process.terminate()
    if process.returncode != 0:
        raise AssertionError('%s FAILED (%s)\nstdout:\n%s\nstderr:\n%s' % (
            cmd, process.returncode, out.read(), err.read()))

Indeed, it's icky to fill up your hard disk rather than some internal buffer, but you can get a lot more done before you run out of HD space. Now, this surely feels nasty, but IMO it's slightly less nasty (or, at least, less code) than anything else I've tried. Hopefully you, gentle reader, have an infinitely superior solution that you will detail in the comments. Surprise me!

Please note that "Don't use Windows, har har", and variants thereof, fail to qualify as "surprising" ;-).

Thursday, 24 December 2009

Spare batteries for IronPython

As we all know, Python comes with batteries included in the form of a rich standard library; and, on top of this, there are many awesome and liberally-licensed packages just an easy_install away.

IronPython, of course, includes *most* of the CPython standard library, but if you're a heavy user you might have noticed a few minor holes: in the course of my work on Ironclad, I certainly have. Happily for you I can vaguely remember what I did in the course of bodging them closed with cow manure and chewing gum; here then, for your edification and delectation, is my personal recipe for a delicious reduced-hassle IronPython install, with access to the best and brightest offered by CPython, on win32.

Install IronPython 2.6.

Download Jeff Hardy's zlib for IronPython and copy IronPython.Zlib.dll into IronPython's DLLs subdirectory (create it if it doesn't exist).

Download Jeff Hardy's subprocess.py for IronPython and copy it into IronPython's site-packages subdirectory.

Download Ironclad, and copy the ironclad package into IronPython's site-packages subdirectory. Yeah, maybe I'll sort out an installer one day, but don't hold your breath.

Install CPython 2.6.

Add CPython's Dlls subdirectory to your IRONPYTHONPATH environment variable.

Copy csv.py, gzip.py, and the sqlite3 directory from CPython's Lib subdirectory to IronPython's site-packages subdirectory.

Copy xml/sax/expatreader.py from CPython's Lib subdirectory to the corresponding location in IronPython's Lib subdirectory.

Download FePy's pyexpat.py, copy it to IronPython's Lib/xml/parsers subdirectory, and rename it to expat.py.

Download and install NumPy 1.3.0 and SciPy 0.7.1 for CPython, and copy them from CPython's site-packages subdirectory to IronPython's.

...and you're done. Start your ipy sessions with a snappy 'import ironclad', and enjoy.

Incidentally, you could just add CPython's site-packages to your IRONPYTHONPATH, and then you wouldn't have to copy extra packages over; the reason I don't do that is because having matplotlib on your path currently breaks scipy under ironclad -- can't remember exactly why -- and it's nice to have matplotlib installed for CPython.

Oh, and let me know if I've made any mistakes above: I just hacked this post together from slightly aged notes, and I'm too lazy to tear down and rebuild my environment to check that every detail is perfect.

Thursday, 10 September 2009

.NET Marshalling: a mildly sarcastic Q&A

1) Want to read or write a chunk of unmanaged memory that you know holds a double?

Easy! Just use Marshal.ReadDou... oh. Hmm.

OK, it seems that -- while doubles work fine in arguments, return values and struct fields -- the Marshal.Read/WriteDouble methods presumably got left in some internal backwater and never made it to the public interface. I'm not sure why this would be -- it seems quite the oversight -- but perhaps it's intended as an oblique philosophical statement: that if anyone needs to use unmanaged doubles directly then they are somehow Doing It Wrong.

Regardless, it may be that you really do need to read or write the odd unmanaged double, in which case you can just subvert the existing paths that do work with doubles. The two obvious options are Marshal.Copy (which has a mind-boggling range of overloads, all of which expect arrays); and PtrToStructure/StructureToPtr, which need you to define a struct containing a single double and read/write that.

Alternatively, you could write a couple of trivial functions in C:


double ReadDouble(double* address)
{
return *address;
}

void WriteDouble(double* address, double value)
{
*address = value;
}

...and, once you've loaded the resulting dll, and acquired the pointers to those functions, you can use Marshal.GetDelegateForFunctionPointer to make them available to your .NET code.


[DllImport("kernel32.dll")]
public static extern IntPtr LoadLibrary(string _);
[DllImport("kernel32.dll")]
public static extern IntPtr GetProcAddress(IntPtr _, string __);

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate double dgt_ReadDouble(IntPtr _);

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate void dgt_WriteDouble(IntPtr _, double __);

...

public dgt_ReadDouble ReadDouble;
public dgt_WriteDouble WriteDouble;

void Init(string path)
{
    IntPtr lib = LoadLibrary(path);
    IntPtr fpRead = GetProcAddress(lib, "ReadDouble");
    ReadDouble = (dgt_ReadDouble)Marshal.GetDelegateForFunctionPointer(
        fpRead, typeof(dgt_ReadDouble));
    IntPtr fpWrite = GetProcAddress(lib, "WriteDouble");
    WriteDouble = (dgt_WriteDouble)Marshal.GetDelegateForFunctionPointer(
        fpWrite, typeof(dgt_WriteDouble));
}

I should point out that this approach is (1) moderately tedious to implement and (2) somewhat opaque to casual inspection, so I can't really recommend it in normal circumstances. Still, I wrote all that code -- I may as well post it.

2) Want to stub out unmanaged code with a managed delegate?

No problem: Marshal.GetFunctionPointerForDelegate returns a perfectly good function pointer, just as expected. Whether you then use it to neatly overwrite another function pointer somewhere, or just to poo a JMP instruction on top of the original implementation, is between you and your conscience.

However, there is at least one subtlety that may cause problems. What happens if unmanaged code somehow passes that function pointer back into managed code when you're not expecting it?

Let's say you're expecting a callback that can be converted to a FooDelegate. If you're given a genuine unmanaged function pointer, there's no problem: you can convert it to any delegate type you like (and, of course, suffer the consequences if you pick the wrong one). However, if you happen to be passed an unmanaged function pointer that was originally converted from some *other* managed delegate type, say a BarDelegate, you're out of luck -- the cast will fail.


[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate int FooDelegate(IntPtr _, IntPtr __);

[UnmanagedFunctionPointer(CallingConvention.Cdecl)]
public delegate int BarDelegate(IntPtr _, IntPtr __);

No matter that a FooDelegate has the same return type, parameter types, calling convention and star sign as a BarDelegate -- it seems that they are so fundamentally different from one another that any attempt to convert between them, however circuitous the path, must be forbidden. I can understand why; I just don't like it. Goddamn static typing weenies ;).

And, if you hit problems of this nature, there's really nothing you can do about it but to autogenerate a bunch of code to define *one* delegate type per signature, and to *always* use that delegate type for unmanaged functions with that signature. It's stupid and ugly and it sucks, and it has a terrible tendency to combine with nasty unmanaged interfaces to create names like 'dgt_ptr_ptrptrptrptrintintintintptrptrintptr', but it will help you get around this issue.

3) You've just called across the boundary for the first time, and it didn't crash, and you're feeling on top of the world?

Sweet! But, before you go any further, please make sure that the arguments you passed in actually arrived safely at the other end.

If they aren't exactly as you expect, you probably used the wrong calling convention (and weren't lucky enough to crash immediately). Doing this will mortally wound your stack, but the process will probably limp along for a while, until it finds a suitably misleading moment to explode messily. This explosion will be reproducible, but it will also be utterly bizarre, and the most apparently trivial of changes can lead to new explosions in apparently unrelated locations; you can easily lose a day or two chasing the stack pointer fairies. Not fun.

And, if you'd just checked your data in the first place, you might have noticed you were slinging garbage about *before* you set off on your little jaunt.

4) Want to use an unmanaged API that takes a FILE*?

I weep for your soul, but you can indeed do such a thing.

Stream.DangerousGetHandle will enable you to extract the underlying file handle from a managed stream, after which you can get a FILE* with _open_osfhandle and _fdopen... but be advised, that 'Dangerous' there ain't just for show. The moment that unmanaged code operates on the file, the .NET stream becomes a massive and deadly liability: simple operations may fail silently, which is bad enough, but sometimes giant rocks fall from the sky and kill everyone.

So, don't do that, if you can possibly avoid it. Ideally, figure out some way to use unmanaged file handles throughout, and wrap them yourself for .NET if you have to.

5) So, you've taken my advice, and now you want to play around with unmanaged streams?

Go ahead. The aforementioned wrapping is completely irrelevant to this topic, so I leave that as an exercise for the reader (ha!). However, you probably need to be aware that you may be working with multiple versions of fopen, fread, fwrite, etc; one for each of the many versions of the Microsoft C runtime.

So, if and when you see inexplicable crashes when something passes an obviously valid FILE* to (say) fread, you should check whether the producing fopen and the consuming fread come from the same runtime. If not, you've found your problem; the solution is easily stated (use functions from matching runtimes) and might even be easily implemented. Your mileage may vary.

6) Want to know how to marshal C varargs into a .NET method?

Ideally, don't. The best solution I could come up with was frankly too evil to live, or ever to speak of again; and I say that as a man who has cold-bloodedly perpetrated every technique discussed in this post.

If you have a nice way to do it, I'd be interested to hear :-).

Thursday, 15 January 2009

More foolishness; now with added drunkenness

Well, it seems my last post was a year ago, and it therefore seems likely that I may actually be dead before I get around to making another. I wish this was a joke.

However, with that sobering realisation comes a silver lining: that, equally, I'll probably be dead before I notice anyone's response to my rantings, so I may as well assert away, as foolishly and offensively as I please. So:

1) Explicit static typing may be all very well if you're coding tedious predictable crap that's been done a million times before, but -- if you're trying to do anything even slightly interesting or worthwhile -- it's an ugly and stupid waste of your short and precious life.

2) Don't bother to disagree: my AcceptFeedback method only takes FawningAdulation instances.

3) No, seriously, don't bother. All the FawningAdulation class can do -- and, indeed, all it needs to do -- is tell me how awesome I am. Also, it's 'final' (that's in Java; 'sealed' and 'totally fucking useless' are the synonyms in C# and English), so you can't even subclass it and make it do something useful.

4) No! Shut up; I don't care; I designed this system perfectly in the first place, and the only reason you're complaining about the feedback API is because you're too stupid to use it properly. Go away.

If you take my point: good. If not, read this post again and again, and again, until it seeps in.

Foolish Assertions