Playing with asyncio

filed under python, twisted and asyncio

Lately I've been playing with asyncio, a new package being introduced in Python 3.4 for rebooted asynchronous IO support in the standard library.

It's very nice, and despite there being no documentation apart from the PEP at the moment, I've found it quite straight forward to work with. I thought I'd share some examples and compare it to my experiences with Gevent and Twisted.

While Gevent and Twisted aim to be higher level frameworks, asyncio aims to be a lower-level implementation of an asynchronous event loop, with the intention that higher level frameworks like Twisted, Gevent or Tornado, will build on top of it. However by itself, it makes a suitable framework on its own.

By providing a common event loop for all the major frameworks to plug into, the intent is that you can mix and match all the different frameworks together and have it just work.

Here are some quick examples of what the code looks like:

Asynchronous Sleep

Nothing fancy. Just sleep for 5 seconds. But without blocking the main loop.

import asyncio
import time

@asyncio.coroutine
def sleepy():
    print("before sleep", time.time())
    yield from asyncio.sleep(5)
    print("after sleep", time.time())

asyncio.get_event_loop().run_until_complete(sleepy())
Simple echo server

This is a simple echo server, used for showing you how to start a server with a given protocol. You can connect to it with telnet 127.0.0.1 4444. Everything you type into the telnet session will be sent back to you from the server.

import asyncio

loop = asyncio.get_event_loop()

# an instance of EchoProtocol will be created for each client connection.
class EchoProtocol(asyncio.Protocol):
    def connection_made(self, transport):
        self.transport = transport

    def data_received(self, data):
        self.transport.write(data)

    def connection_lost(self, exc):
        server.close()

# run the coroutine to establish the server connection, then keep running
# the event loop until the server is stopped.
server = loop.run_until_complete(loop.create_server(EchoProtocol, '127.0.0.1', 4444))
loop.run_until_complete(server.wait_closed())
Async clamd client (virus scanner)

This is the longest example, but it's basically a clamd client. It will open a file from sys.argv and send it to clamd for scanning. It displays how you can use asyncio.Future objects to communicate between coroutines, and how you create client TCP connections.

import asyncio
import sys
import struct

loop = asyncio.get_event_loop()

class ClamAVProtocol(asyncio.Protocol):
    def __init__(self, future, payload):
        self.future = future
        self.payload = payload
        self.response_data = b''

    def connection_made(self, transport):
        self.transport = transport

        self.transport.write(b'nINSTREAM\n')

        size = struct.pack(b'!L', len(self.payload))
        self.transport.write(size + self.payload)
        self.transport.write(struct.pack(b'!L', 0))

    def data_received(self, data):
        self.response_data += data

        if b'\n' not in self.response_data:
            return

        self.transport.close()

        response = self.response_data.split(b'\n')[0]

        # set the result on the Future so that the main() coroutine can
        # resume
        if response.endswith(b'FOUND'):
            name = response.split(b':', 1)[1].strip()
            self.future.set_result((True, name))
        else:
            self.future.set_result((False, None))

def clamav_scan(payload):
    future = asyncio.Future()
    if payload:
        scanner = ClamAVProtocol(future, payload)

        # kick off a task to create the connection to clamd.
        asyncio.async(loop.create_connection(lambda: scanner, host='127.0.0.1', port=3310))
    else:
        future.set_result((False, None))

    # return the future for the main() coroutine to wait on.
    return future


def main():
    with open(sys.argv[1], 'rb') as f:
        body = f.read()

    found_virus, name = yield from clamav_scan(body)

    if found_virus:
        print("Found a virus! %s" % name)
    else:
        print("No virus. Everything is safe.")

if __name__ == '__main__':
    loop.run_until_complete(main())

There's obviously a lot more to it, but that's the basic gist of using the library. How does it compare to Twisted and Gevent, though?

Versus Twisted

As @eevee said:

That really does capture it quite well. asyncio's Protocol class provides much of the same interface as Twisted's Protocol class, in that you can pause/resume producing of transports, you have connection_made, data_received and connection_lost methods, as well as other things. So in terms of the Protocol/Transport API, Twisted and asyncio can be considered roughly the same. I won't go into it very much - you can read more on the PEP and draw your own conclusions.

Both libraries provide a way to defer blocking operations to threads, to avoid blocking the main loop, and communicating the results back to the main loop when done.

In Twisted, you typically plug things into the event loop by using methods on twisted.internet.reactor, like callLater. Similarly in asyncio, you plug things into the event loop by using the object returned by asyncio.get_event_loop(). Beware that in asyncio, though, you cannot pass a coroutine to any of the call_* methods. This bit me and took me a while to figure out.

Coroutines written for asyncio use yield from syntax to signify asynchronous operations while still looking reasonably synchronous otherwise. This is similar to Twisted's defer.inlineCallbacks decorator. If callbacks are more your style, though, you can add them to Tasks and Futures with the add_done_callback method.

Something that asyncio does provide that Twisted doesn't, is asynchronous signal handling. This is something I've wanted to see in Twisted for a while, and I'm not sure why it doesn't have it.

By far, Twisted's advantage is in its protocols and helpers, though. It has protocols for just about everything, meaning you very rarely have to implement anything yourself. If you've used twisted.web.client.getPage, you'll feel a bit of frustration using asyncio as it doesn't have anything like this - you need to implement it all yourself. The same goes for twisted.internet.defer.maybeDeferred - if you want anything like this, you'll have to implement it yourself. This is basically Twisted's biggest selling point to me.

As I mentioned earlier, asyncio is intended as more of a lower-level implementation, and there are already efforts to run Twisted on it, so this likely won't be an issue for much longer, but that point, you're just running Twisted.

Versus Gevent

Gevent's big pitch is that it makes synchronous code asynchronous, typically by monkey patching the standard library to make other packages think they're still running synchronously. This means that if your network-bound application is written synchronously, you can, with a bit of effort, get it running asynchrously under Gevent for a good performance boost under heavier loads. Definitely much less effort than rewriting it to use another library.

As hinted by the monkey patching, Gevent is a very big on implicit behaviour. As such, there is no "event loop" object. This goes against the zen of Python in a pretty big way, but I can see the advantage - it's great at taking existing code synchronous code and making it asynchronous without the effort of a full rewrite.

As with all things like this, there are naturally edgecases. Gevent has been working towards a 1.0 release for a few years now (which they did today!), which has mostly been ironing out these edgecases, and ensuring that the API of what is being monkey patched has been replicated thoroughly.

Traditionally, Gevent can patch out the threading module as well, turning threads into Greenlets, which are basically coroutines. This has the disadvantage and implication that you aren't really supposed to use threads in Gevent-based applications. Compared to both Twisted and asyncio's ability to defer tasks to threads, it's a little frustrating. You're given no primitives for deferring long-blocking operations to threads, leaving you to write your own and avoid monkey patching the threading module.

So what are your options? You could go with a fork-based approach to spread out across multiple cores, but there are caveats to that as well. You could also go with something like celery for your blocking tasks, but it is very unclear in my reseach how well the client library works with Gevent. But what should you do? That's unclear. I did say there were edgecases.

I spent a few months working with Gevent for a personal project, and I found the implicit behaviour more mind-bending than Twisted and asyncio's reactive behaviour. What do I mean by that? Basically in Gevent, you have to know what direction data is (supposed to be) going in. You need to know "at this point, I should be receiving some data on this socket". If you're put in a situation where both sides of the connection disagree on this, you'll enter a dead lock. The same goes for closed connections - every time you do a read, you need to check that the connection was closed by the other end (i.e. an empty read). You don't have to do these checks in a reactive event loop, you just react to the event that it was closed.

By far, though, the biggest thing for me is that Gevent has no Python 3 support. There's a third party fork that brings Gevent to Python 3, but nothing official. There's a lot of talk about the matter on the Github repo, but no action as of yet.

Conclusions

I like asyncio. It's basically a minimal Twisted, and I don't think that's a bad thing. Reflecting on what I've written here, I haven't actually written very much about it, but that's because anything I said would basically be the purple monkey dishwasher version of the PEP.

Despite the similarities, I don't think people will complain about asyncio the way they complain about Twisted. This is because of the way coroutine support has been integrated with yield from from the beginning. The defer.inlineCallbacks decorator was a late addition to Twisted, though, and wasn't even possible until Python 2.5. That makes it a victim of history and circumstance, I suppose. That and the camelCasing... but whatever. It's still pretty awesome.

Another complaint that people have with Twisted is that you're "buying into the framework" because so many other things don't interact well with it. But that's true of a lot of things. That's what Gevent attempts to solve, but as I pointed out earlier, there are a few caveats to this. Even then, you're never not buying into Gevent - you're buying into its ability to trick other libraries into thinking that they're still synchronous. asyncio however aims to try and fix this problem by allowing you to run Twisted and Gevent code side-by-side, provided the glue code exists.

This whole thing kind of reads as a big "Asyncio and Twisted vs Gevent-and-Gevent-is-frustrating-sometimes" but it's really not intended to be like that. They're all libraries that are good at solving the problems they intend to, and Asyncio is a good foundation for bringing them all together.

Related Posts