Drinkin gasoline and wine

post zenmachine musings

Tab Sink for Sept 2013

Tab sink (or all that I’ve stuck on OneTab plus 40+ tabs on 3 browser windows)

Tools

  • http://www.cis.upenn.edu/~bcpierce/unison/ - bilateral, multi-os file transfer
  • http://stackoverflow.com/questions/8650031/update-a-git-repository-through-a-git-hook-in-python
  • http://www.jackosx.com/ - Jack audio connector for mac os X (intercepts/reroute audio)
  • http://www.icecast.org/
  • http://butt.sourceforge.net/ - MacOS X encoder for icecast

Go

  • https://github.com/miekg/gobook - Golang Book
  • http://www.goworker.org/ - resque compatible Go implementation. Screw up what is already doomed by design
  • https://github.com/golang/groupcache - power up that cache
  • https://github.com/goraft/raft - raft consensus protocol
  • https://github.com/stathat/consistent/ - stathat’s consistent hash library that I’m using on go-redis
  • http://www.gocircuit.org/vena.html - substitute for OpenTSDB using Go and Leveldb
  • http://www.tamber.com/posts/ferret.html - search engine
  • http://www.goinggo.net/2013/09/running-go-programs-in-ironworker.html

Python and machine learning

  • https://github.com/sloria/TextBlob - PoS tagging, Sentiment analysis in python
  • https://textblob.readthedocs.org/en/latest/ - textblob docs
  • http://textblob-api-1743413701.us-east-1.elb.amazonaws.com/index.html - TextBlob demo

Load test / http traffic visualisation

  • https://github.com/andrewvc/engulf | andrewvc/engulf · GitHub
  • https://github.com/mnot/htracr/ | mnot/htracr
  • http://mnot.github.io/redbot/ | mnot/redbot @ GitHub

Message queue/brokers

  • https://github.com/tef/botox | tef/botox - SQS clone
  • http://blog.thoonk.com/ - data schema for job and message queues

Data mining code and Q&A repositories

  • https://bigquery.cloud.google.com/table/publicdata:samples.github_timeline?pli=1 | Google BigQuery
  • http://docs.libsaas.net/en/latest/generated/github/ | GitHub — libsaas 0.1 documentation
  • http://paulmillr.com/posts/github-pull-request-stats/ | Paul Miller — Your pull request won’t be accepted
  • http://blog.stackoverflow.com/category/cc-wiki-dump/ | cc-wiki-dump « Blog – Stack Exchange
  • http://www.brentozar.com/archive/2010/02/querying-the-stackoverflow-data-dump/ | Querying the StackOverflow Data Dump | Brent Ozar Unlimited
  • http://data.stackexchange.com/stackoverflow/query/edit/119630 | Query Stack Overflow - Stack Exchange Data Explorer
  • http://meta.stackoverflow.com/questions/2677/database-schema-documentation-for-the-public-data-dump-and-data-explorer | Database schema documentation for the public data dump and Data Explorer - Meta Stack Overflow
  • https://github.com/es-analysis/plato
  • http://meta.stackoverflow.com/questions/159715/should-i-answer-my-questions-that-were-refused-by-community | Should I answer my questions that were refused by community? - Meta Stack Overflow
  • http://meta.stackoverflow.com/questions/10582/what-is-a-closed-question | What is a “closed” question? - Meta Stack Overflow

Email layouts/tools/servers

  • http://purecss.io/layouts/email/ | Pure
  • https://github.com/lamoanh/gmail_skin | lamoanh/gmail_skin · GitHub
  • http://notmuchmail.org/ | notmuch
  • https://github.com/teythoon/afew | teythoon/afew
  • http://lamsonproject.org/ | LamsonProject: Lamson The Python Mail Server

Chart libraries/Browser side libraries

  • https://github.com/skorulis/stork | skorulis/stork - export canvas drawing to PNG
  • https://github.com/mher/chartkick.py | mher/chartkick.py - create js charts from python
  • https://github.com/flot/flot/blob/master/API.md | flot/API.md at master · flot/flot · GitHub
  • http://blog.videojs.com/ | Video.js Blog - Video.js is a JavaScript framework for HTML5 and Flash video
  • http://www.google.com/tagmanager/ | Google Tag Manager official website
  • https://togetherjs.com/ - collaboration library by Mozilla

Geolocation, topology and libraries

  • http://leafletjs.com/ - Mobile friend OSS interactive maps
  • http://www.geojson.org/geojson-spec.html - geospatial data interchange format
  • https://github.com/topojson/topojson-specification/blob/master/README.md - topological data format spec
  • https://help.github.com/articles/mapping-geojson-files-on-github - GEOJson on Github

PaaS

  • https://github.com/dyndrop | dyndrop (Dyndrop)
  • http://nick.stinemat.es/#continuous-deployment | Automation Blog - Nick Stinemates
  • http://codeascraft.com/2013/09/23/lxc-running-14000-tests-per-day-and-beyond-part-1/#!
  • http://osv.io/ - JVM over hypervisor
  • https://github.com/eugeneware/docker-wordpress-nginx - Docker + nginx + wordpress
  • https://github.com/jbfink/docker-wordpress - docker and wordpress

Ping latency data reports

  • http://www-iepm.slac.stanford.edu/pinger/ | PingER (Ping End-to-end Reporting)
  • http://www-iepm.slac.stanford.edu/pinger/explorer.html | PingER Data Explorer
  • http://www.google.com/publicdata/explore?ds=nc650op6n4i1l_&ctype=m&strail=false&bcs=d&nselm=s&met_c=minimum_rtt&met_s=population&ifdim=country&tunit=Y&pit=1320217200000&uniSize=0.03500000000000001&mapType=t&yMax=64.1826&xMin=-175.1992&xMax=179.1992&iconSize=0.5&icfg&yMin=-41.4648 | Pinger Visual Landscape - Google Public Data Explorer

Monitoring, analytics

  • https://speakerdeck.com/christineyen/gluecon-2013-think-backwards-realtime-analytics-plus-cassandra | GlueCon 2013 - Think Backwards: Realtime Analytics + Cassandra // Speaker Deck
  • https://github.com/snowplow/snowplow/wiki/Technical-architecture - Snowplow analytics (hadoop, redshift)
  • https://speakerdeck.com/auxesis/the-psychology-of-alert-design
  • https://speakerdeck.com/astanway/mom-my-algorithms-suck - nice prezo on models vs monitoring
  • https://conf-slac.stanford.edu/xldb-2013/sites/conf-slac.stanford.edu.xldb-2013/files/RaviMurthy_Facebook_Tues_xldb_analytics_infra_2013.pdf

Metrics and statistics

  • http://en.wikipedia.org/wiki/Apdex | Apdex - Wikipedia, the free encyclopedia
  • http://www.joyent.com/blog/visualizations-for-outliers-and-multi-modal-latency - series of articles on outliers and latency visualisation

Automated browser testing

  • https://github.com/ryanseddon/bunyip | ryanseddon/bunyip · GitHub
  • https://forwardhq.com/# | Share localhost over the Web — Forward

Real world accidents

  • http://roc.cs.berkeley.edu/294fall01/slides/Tetzlaff.pdf | roc.cs.berkeley.edu/294fall01/slides/Tetzlaff.pdf
  • http://en.wikipedia.org/wiki/Three_Mile_Island_accident | Three Mile Island accident - Wikipedia, the free encyclopedia
  • http://en.wikipedia.org/wiki/Alaska_Airlines_Flight_261 | Alaska Airlines Flight 261 - Wikipedia, the free encyclopedia
  • https://www.ntsb.gov/investigations/summary/AAR0201.html | Aircraft Accident Report: AAR-02-01

DHT, torrent, distributed system

  • http://btsync.s3-website-us-east-1.amazonaws.com/BitTorrentSyncUserGuide.pdf | btsync.s3-website-us-east-1.amazonaws.com/BitTorrentSyncUserGuide.pdf
  • http://engineering.bittorrent.com/2013/01/22/bittorrent-tech-talks-dht/ | BitTorrent Tech Talks: DHT | The BitTorrent Engineering Blog
  • http://labs.bittorrent.com/experiments/sync/technology.html | BitTorrent Labs
  • http://www.iis.sinica.edu.tw/page/jise/2006/200609_16.pdf - A Fault-Tolerant Protocol for Generating Sequence Numbers for Total Ordering Group Communication in Distributed Systems

Metrics

This holiday I’ve released two new projects at my github repository: PyMetrics and TxMetrics. There are metrics libraries inspired by Coda Hale’s Metric project [http://metrics.codahale.com/]. Although not that complete, I’ve ported what I feel that I was repeating myself in my projects and what I lacked to python and the twisted framework.

Some months ago, I’ve ported a descriptive stats library to python and also integrated it into PyMetrics and TxMetrics. These libraries are backed by redis and I intend to release a port to ruby using the same underlying data structures.

Most of the applications have at least one form of counter spread around it. My intention was to use the same semantic for these situations and also introduce new ways to measure data that is not bound to standard monitoring software extracting data from TCP ports or log files.

The difference between them is that I’ve had to port the tradicional python approach to twisted using the deferred constructs and another library that I contribute to, txredisapi.

With these libraries you can extract data realtime from your application and go from monitoring to dashboards, which is the next step I’m planning with the ruby client: to integrate with Dashing

Links:

Ye Olde Devops Notes and Links

Note: this is an old post that I’ve had for months in my pipeline waiting for another project.

late night chats with cv - intended to be published somewhere at it’s never Lispus

This a summary of a gtalk chat on a deploy/keep it simple workflow w/o touching on devops or any other diatribes. Mostly interesting links and notes I’ve extracted from the chat history.

@cv + @gleicon

using http and memcached as messaging protocols

  • clients everywhere
  • sanitize following their rules (http timeouts/memcached keys)
  • they are not transports so short messages
  • pool strategies (more than one connection)
  • reconnect strategies

using lxc and rootfs images

  • rootfs can run on KVM and LXC (probably vbox ?)
  • vagrant/simplestack to interface to local/external VMs (would need to patch vagrant, etc)
  • use ubuntu/whatever new image with all bundled software

package when tagged

  • bricklayer/fpm/rpmbuild/etc
  • deploy images around before committing to production release
  • no need for conf management if the image is already loaded

good images

  • new software
  • redis/memcached up and running at localhost
  • few configuration items (localhost or ENV var)
  • use buildpacks
  • bake images with packaged + tagged production version, distribute over nginx

deploy

  • fetch images
  • deploy on VMs/LXC
  • instrument images with watchdog agent
  • messages over queue/http queue/pubsub
  • one server lost -> spin new server
  • loadbalancer ?
  • monitoring: riemann for real distributed stuff/uptime for local

links

  • https://github.com/cv/escape-server-config (config server management)
  • https://github.com/locaweb/bricklayer (packaging app server)
  • http://www.stgraber.org/2012/03/04/booting-an-ubuntu-12-04-virtual-machine-in-an-lxc-container/ (linux containers)
  • https://github.com/locaweb/simplestack (hypervisor api)
  • http://vagrantup.com/
  • http://blog.heroku.com/archives/2012/7/17/buildpacks/
  • https://github.com/ddollar/mason (buildpack automation)
  • https://github.com/peterkeen/dokuen (mini paas)
  • https://github.com/ddollar/foreman (app control)
  • https://github.com/fzaninotto/uptime (uptime monitoring)
  • http://www.openresty.org/ (nginx +lua + redis dynamic vhost)
  • http://graylog2.org/
  • https://github.com/locaweb/logix (graylog2 syslog -> amqp for graylog2)
  • https://github.com/gleicon/python_dns_servers (possible dynamic dns)’

Tab Sink for the End of 2012

Note: this is an old post that I’ve had for months in my pipeline waiting for another project

The tab sink for the end of 2012 (originally I started this post Oct 20 2012)

Most of these stuff were on my tabs for the whole week (and even weeks before that) and just now I managed to filter and check them.

  • Last month the Surge conference happened (http://omniti.com/surge/2012) with a lot of goodies. This week the videos were published and there’s lots of good stuff. I can highlight two examples (not that the other are not good):

    Arthur Bergman and Mysteries of a CDN explained (Fastly)

    Pedro Canahuati and Operating at Scale (Facebook)

  • A distributed counter from basho, presented at RICON: riak_dt. I think the videos are on the way. I’ve watched it through live streaming. Good stuff all around.

  • Still on conferences, it seems like Monitorama will be interesting. There’s lot on monitoring to be said - less on tools more on doing the smart things. (This presentation )[https://speakerdeck.com/u/obfuscurity/p/the-state-of-open-source-monitoring] from @obfuscurity is a great review and starting point. Fuck nagios.

  • Make sure to review (Agile Data)[http://ofps.oreilly.com/titles/9781449326265/index.html] from Russel Jurney - although I always associeate “Agile” with bikeshedding this is only on the title and the book means it. From Chapter 2 on you will find practical examples on data analysis. The pieces on email are very interesting.

  • Sketching data structures are summaries of structures that otherwise would take a lot of space/time to process. This review takes on Bloom Filters and Count-Min structures in a very clear manner. Be the smart guy in your local bikeshedding meeting by throwing away a “well we’ve had webscale data so I’ve reimplemented my indexes as count-min just for fun”.

  • A series from Performance Dynamics on Little’s Law and I/O performance here and here plus a piece on bandwidth vs latency. Let’s get educated.

  • A piece on graph based recommendation engine using neo4j. Worth the exercise, this might help on event correlation too (same principle but diff techniques to relate cluster of data).

  • A private PaaS with Mongrel2 and ZeroMQ odd but very complete. I’ve tried with mixed results but can see the reasoning on extending mongrel2 to do it.

  • Still on PaaS What happens when you push to heroku and openruko github repo are full of awesome by @nonuby

  • hipache is a http and ws proxy/router. These things are the heart of a PaaS. I’d love to see it as a nginx module. It’s doable with openresty but involves lua and other modules. From the great @dotcloud team.

  • We’ve started this year a series of techtalks at (http://www.locaweb.com.br) to spread the knowledge and create a sense of community inside the company. Mediacore was our platform of choice to manage the videos. Yeah, create your own youtube etc.

  • Jeez I dont even…. browserver - it works, but oh god why :D nice code and concept, I wish it was torrent so we could p2p between browsers nearby.

Keep tuned. This list is bound to get back as soon as I clog my browser with tabs again ((anytime between one weekend/2 weeks)). Cheers.

Porting an App From Tornado to Cyclone (Quick defer.inlineCallbacks/yield Primer)

Note: This will probably turn out in a screencast some time or another.

To port an app from Tornado to Cyclone is an easy task. There are some quirks tho related to Twisted mainly. This is necessary so you can enjoy the environment and async drivers in a proper manner. Tornado is know to provide good abstractions so you can use regular drivers to hook into its IOLoop. But as event loop goes, most of them requires that the whole stack is aware of the cooperative nature between its components. By using twisted you already have that and bundled with cyclone you have Redis, MongoDB, SQLite and other drivers for applications and protocols.

I’ve found a neat app called RedisLive and ported it to cyclone. It is basically a Redis real time resource monitor composed of two parts: a web interface and a daemon to collect data.

To keep it simple and within a reasonable ammount of code to be explained I’ve just ported the web interface and created a separated data provider. I’ve started a twisted based metric collector that helped me to fix and add some missing pieces to cyclone redis driver but I wanted to stick with the original collector.

I’ll walk out the main parts that were changed. The forked repository is at (my github)[https://github.com/gleicon/RedisLive]. There’s a full diff file inside my project.

A good approach to port a web application from Tornado to Cyclone is to tackle the web section, and this is what I did. Inside the folder RedisLive/src/api/controller lives the Controllers for each route listed at RedisLive/src/redis-live.py. So starting by redis-live.py:

diff --git a/src/redis-live.py b/src/redis-live.py
index 43479f4..9318c35 100755
--- a/src/redis-live.py
+++ b/src/redis-live.py
@@ -1,8 +1,8 @@
 #! /usr/bin/env python

-import tornado.ioloop
-import tornado.options
-import tornado.web
+from twisted.internet import reactor
+import cyclone.options
+import cyclone.web

 from api.controller.BaseStaticFileHandler import BaseStaticFileHandler

@@ -15,7 +15,7 @@ from api.controller.TopKeysController import TopKeysController


 # Bootup
-application = tornado.web.Application([
+application = cyclone.web.Application([
   (r"/api/servers", ServerListController),
   (r"/api/info", InfoController),
   (r"/api/memory", MemoryController),
@@ -27,6 +27,6 @@ application = tornado.web.Application([


 if __name__ == "__main__":
-   tornado.options.parse_command_line()
-   application.listen(8888)
-   tornado.ioloop.IOLoop.instance().start()
+  cyclone.options.parse_command_line()
+  reactor.listenTCP(8888, application, interface="127.0.0.1")                 
+  reactor.run()

Basically it turned into a simple twisted application (not a TAC tho). I’ve changed package names from tornado.* to cyclone.* and switched from tornado.ioloop to reactor.listenTCP()/reactor.run(). The routing and class structure is the same as we are going to see in the Controllers. Most of the changes are package change (from tornado.web to cyclone.web) and by surrounding the drivers call with yield/defer.inlineCallbacks/defer.returnValue.

The regular way to call a method or function according to the twisted way(tm) is that you receive a Deferred class, in which you attach a callback for success and another for error. By using defer.* and yield code gets more readable without that many callbacks following each external call. The downside is that while you are using this decorator and returnValue, your function returns generators so it gets incompatible with normal functions. So you end up structuring your code around it to save time and in some places you got to stick to regular callbacks to make it compatible to libraries that are ported from a non-twisted code. Bikeshedding apart, is a good resource that Twisted provides.

diff --git a/src/api/controller/BaseStaticFileHandler.py b/src/api/controller/BaseStaticFileHandler.py
index 162fa62..6343a53 100644
--- a/src/api/controller/BaseStaticFileHandler.py
+++ b/src/api/controller/BaseStaticFileHandler.py
@@ -1,6 +1,6 @@
-import tornado.web
+import cyclone.web

-class BaseStaticFileHandler(tornado.web.StaticFileHandler):
+class BaseStaticFileHandler(cyclone.web.StaticFileHandler):
    def compute_etag(self):
        return None

In this case we didn’t need to change anything beyond module names. Now a bit of defer/yield:

diff --git a/src/api/controller/CommandsController.py b/src/api/controller/CommandsController.py
index cd9df26..ac046e3 100644
--- a/src/api/controller/CommandsController.py
+++ b/src/api/controller/CommandsController.py
@@ -1,12 +1,12 @@
 from BaseController import BaseController
-import tornado.ioloop
-import tornado.web
 import dateutil.parser
 from datetime import datetime, timedelta
+from twisted.internet import defer


 class CommandsController(BaseController):

+    @defer.inlineCallbacks
     def get(self):
         """Serves a GET request.
         """
@@ -45,7 +45,7 @@ class CommandsController(BaseController):
           group_by = "second"

         combined_data = []
-        stats = self.stats_provider.get_command_stats(server, start, end,
+        stats = yield self.stats_provider.get_command_stats(server, start, end,
                                                       group_by)
         for data in stats:
             combined_data.append([data[1], data[0]])

Note that before the “get” method we’ve added this decorator. If you are using other cyclone decorators, make sure this is the first one (for example if you are using cyclone.web.authenticated and so on). also when using the stats_provider, the call got prepended by yield so the decorator can capture the callback result and make it available to the variable stats without needing an explicit callback (stats = self.stats_provider(…).addCallback(lambda r: do_something(r))).

The rest of diff is at the repository, lets examine the redis stats provider. To be able to keep the original data collector I’ve cloned RedisLive/src/dataprovider/redisprovider.py into txredisprovider.py. Cyclone comes with txredisapi bundled as a package and I’ve wanted to use w/o having to rewrite all the calculation code. Ideally I’d have changed the whole class and collector. Most of the changed were addition of yield/defer.inlineCallbacks and specifics of cyclone drivers as transactions (started by driver.multi() instead of pipeline).

It was a quick job spread over two days (1h/2h each day) to have RedisLive working with cyclone, including the twisted part. Twisted is a very mature framework and it’s worth knowing the protocols it already provides. Also, the provided reactors and task primitives (look into redis-monitor-tx.tac) are useful to split heavy work into small tasks.

Python and GEvent

The last post took some time over cyclone and it wasn’t fair that I’ve mentioned gevent briefly. I’ve have been using this library both for quick prototypes, production code and system upgrades. It’s not an instant-evented-magic-to-crappy-code but it provides simple and solid primitives such as greenlets that enable the use of good libraries in a fashion manner.

For instance, the great kombu library, which provides abstraction over different messaging protocols is not available to twisted. Worst yet, the txAMQP library is not straight forward to use. At the mure project I wanted to come with a quick and simple agent network that communicated for a shared bus. I wasn’t worried about which kind of channel as long as I could prototype and run it quickly. It proved good because in short time I’ve implemented an EventEmitter clone inspired on node.js and a few days ago a bridge between python and node.js event emitters.

It could be done using twisted but I would have to shave the yak related to a common multi broker messaging or stick to a single message broker. Not a problem if I had it clear from the start what I wanted it to be. But having gevent helped a lot to leverage the common blocking libraries and to use greenlets as a thread abstraction.

mure/core.py link
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
    def add_worker(self, workername, callback):
        self.workers[workername].add(callback)
        def _listener():
            qname = Queue(workername, Exchange("exchange:%s" % workername, type='fanout'))
            while self._connected == False:
                print "waiting %s" % self._connected
                gevent.sleep(1)
            with BrokerConnection(self._transport_url) as conn:
                with conn.SimpleQueue(qname) as queue:
                    while True:
                        try:
                            message = queue.get(block=True, timeout=10)
                            if message:
                                self._execute_callbacks(workername, message.payload)
                                message.ack()
                                if self._connected == False: break
                        except:
                            pass
        gevent.spawn(_listener)

The last line spawn the function _listener which binds into an exchange queue. Each of these listeners take care of a communication channel and execute the callbacks associated to that worker name. Exchange, queue, worker are the samething applied in different contexts. These workers (@worker(‘name_of_worker’)) are stored in a hash, each item a list of listeners. This spans out the messages around the right recipients.

Another great gevent companion is bottle, a DSL for web programming. Its interface is clean and combining with gevent lets you quickly come with thin webservices interfaces. I’ve create an application called uurl - an url shortener, entirely based on gevent, bottle and readis. I’ve set to rewrite it from time to time, on different languages and frameworks to get a hang of their components and so far this is the cleanest implementation. It started as an WSGI service and later I’ve converted to gevent by simply changing servers, monkey patching all and using a redis connection pool.

Monkey patching is a technique that bottle uses to convert the original socket, threads and other python modules to be non-blocking using its greenlets and I/O loop, in a way that the code change is minimal from a blocking application.

uurl.py link
1
2
3
4
5
6
class GEventServerAdapter(ServerAdapter):
    def run(self, handler):
        from gevent import monkey
        monkey.patch_socket()
        from gevent.wsgi import WSGIServer
        WSGIServer((self.host, self.port), handler).serve_forever()

To close this post, I’d like to add that I usually compare this setup with Ruby/Sinatra/Thin. As a derivative of RestMQ I’ve created TinyMQ - a set of small implementations of RestMQ core ideas. This is a subject for another post but the whole message broker ran on less than 100 lines tinymq.py.

Cyclone - a Twisted Based Tornado Implementation

Overview

Some time ago, a company called FriendFeed released Tornado, a neat web application server for python. After some press and unquestionable results it was discussed whether it should have used the Twisted Framework foundation instead of implementing a new ioloop. Long history short, Tornado shares some similarities with Twisted, but the programming API is better looking than twisted.web.

After some trials by different people Alex Fiori forked Tornado and bundled it with a Twisted backend and some other goodies - calling it cyclone. After a while I started using it to build RestMQ and started contributing code for WebSockets and other drivers. There is a lot of sense in combining both worlds as Twisted has an extensive ibrary of protocols and clients, a well defined programming model (like it or not, based on deferreds/futures and generators) and mature cross-platform ioloop implementation. Cyclone’s gettext implementation was merged back some time in the past and we constantly merge from upstream on interesting features.

I do most of my coding in python splitting time between cyclone and gevent and right now I gotta say that cyclone has great features that compete in terms of productivity with Tornado.

Code that is build on tornado will run easily after correcting the package names. On the parts related to ioloop, there are the same mapped functionality on twisted - such as timers and pools. To build new protocols you can leverage LineProtocol and other interesting tx classes. The best part is taking advantage of drivers. In an evented loop, if you use a regular driver that can block (pause while waiting for an answer from network or heavy calculation) the other operations are also halted.

If you have a defined programming model to deal with it (which both tornado and twisted defined), it is a matter of yielding at the right moment or using a deferred return to realize the result of the operation later. That can lead to a kind of callback hell both for reading code and profiling it but there are few abstractions that will go far away from it.

Interesting Cyclone features

Tornado core (ioloop) was changed by a twisted based factory which yield the right reactor. Over this structure the protocol implementation and clients were adapted to use it with minimal to none interface changes.

The most affected module initially was cyclone.web but the whole structure changed and got bundled drivers as mongodb, redis, sqlite and protocols as XMLRPC, JSONRPC, websockets and sse. There is an email module already which can serve as template-to-message app, based on TwistedMail. All these features are natively asynchronous.

Beyond that, a cyclone app is a twisted protocol and can take advantage of the surrounding structure as plugins and PyDirector/cpu affinity. It was easy to merge or create due to the synergy based on already existing twisted applications. There is also an application skeleton and a minimal bottle.py DSL port - both allowing for quickstart web applications.

Much of the authentication and authorization is done over decorators, allowing for clean code - along with the inline deferreds:

class IndexHandler(cyclone.web.RequestHandler):
    @cyclone.web.authenticated  # triggers authentication
    @defer.inlineCallbacks      # allows for inline callbacks
    def get(self):
        result = yield self.do_download()  # inline callback, no need to explicitly added 
        self.write(result)

That alone may help on the callback spaghetti but it keeps being twisted.

Evented I/O intermission - is it faster ?

No. It’s just a choice of multiplexer to allow for better time utilization. It helps scaling better the same resources as other approaches such as greenlets over an evented loop (gevent). The most important thing to note is that they are not a substitute for threads of VM limitations. My setups usually are 1 or 2 instances per core, with affinity and a load balance in front of them. This can be done in a different manner as using threadpools or even processes.

Last year I presented at Sao Paulo Perl Workshop and OSCon on this subject and the feedback that I got is that most of the time the very application gets complex so the matter of evented I/O ends up being one of the things that shapes these changes (for good or bad) but not the safeguard or guarantee that the quality will keep up.

EOF

Check the code and try it for yourself. I consider that using cyclone is a good and gentle intro to twisted as it sets a clear objective over web applications. There are plenty of good code at the demos directory and the app skeleton already comes up with a bootstrap.css based application boilerplate. Also, send patches.

On That Message Queue…

…that you are trying to do with Redis, the answer is NO.

Not by any issue with Redis (which I use in a lot of projects and it’s a fine piece of software), but because the primitives are not enough to pass as a message broker.

Looking to Redis and ZeroMQ as a message broker in the classic ActiveMQ/RabbitMQ/RestMQ sense is naive, because both are transports (and in Redis case, persistence sometimes) and building blocks that can be attached to systems like these and libraries..

The regular drivers for Redis usually provides no reconnection in case of error or subscribe disconnect, so you probably might end up having your process hanging or if you are lucky, killed with an exception in case of a lost connection. The proper way to do it is to surround Redis with a management layer, as RestMQ or Resque does.

An interesting approach taken on the twisted redis client is to make use of connection pools which can reconnect. That was the single feature that made RestMQ possible. Kombu connections follow the same pattern, abstracting fan outs and routes over a connection pool to Redis, the simples fallback in case you dont want to install any other complex message broker.

Applying only to pub/sub channels without fallback is not a good idea specially if you rely on multiple consumers. On the other hand, for simple one-time no-distributed-locks messaging (as with udp multicas or service discovery), it might be a good choice because: a) there is no persistence and b) if the transport is not available, the processes(or agents) are idle.

The fine line here is the direction of the messages. In a exchange environment, the broker plays the good part of managing the transport shortcomings. On a fire and forget which the messages are importante, the broker helps to keep the message somewhere until it’s fetched. Fire and forget of disposable messages might be a good choice to do it if there is no immediate action to be taken and if there is some kind of retransmission/expiring on the sender part. Remember the SMTP protocol.

That’s not even touching the cases where a message queue is used as a kind of distributed lock where only one of many similar consumers might get a message, instead of a broadcast scenario.

Memcached Backend Engines

The memcached protocol is very known and implemented on many languages and frameworks. Its primitives are based around getting and setting values that internally are mapped to keys.

There are two kinds of protocol: ascii and binary. The recommended protocol is the binary protocol, which is modern and has room for new features.

I’ve been trying my hand at some memcached servers implementations in python, based on twisted and gevent, but looking for some answers regarding the protocol answer, I got a tip that the original memcached server is supporting backend engines through an ANSI C interface. In an insomniac week end I was able to hack around two examples: a filesystem based store and a redis based store.

The filesystem engine was based entirely on @trondn tutorial. The Redis engine I made based on my original plan for a python based memcached server. I’ve used Redis’ hashes to store data and attributes for each memcached key.

To create the hash, the engine issues a command like this:

HMSET key nkey data ndata flags exptime

The advantage of Redis commands being atomic is that INCR, DECR and methods like this are a given. It was a matter of using hiredis to map the right commands. One thing that I’d do differently is to implement a kind of connection pool and configuration options.

To test the backend compliance to the binary protocol I’ve used memcapable, which usually comes bundled with memcached. The engine must be compiled as a dynamic library so it can be loaded as memcached -E engine.so.

One of the things that I think that having a configurable memcached frontend is good for is to capture metrics. As many frameworks already have a memcached client, it’s easy to create an instance and increment/decrement counters on it.

Agents and Event Listeners for Python

When modeling a distributed system one might stumble in the Actor model, implemented in one way or another - be it natively on Erlang or a complete library as Akka.

There is a lot of discussion over concurrency models, but regardless the Actor model serves well to break a task between different processes/servers.

Built usually over a messaging channel, these frameworks are usually adapted for a set of tasks. As it wouldn’t be different, some of them had a lot more than I needed, and the frameworks for Python were divided between trying to reimplement Akka or completely different concept.

I took some time out to build Mure to learn more about kombu, a multi-transport library for python. It’s a really simple actor library.

The decorator @worker() says that each time that a message to the queue named after the string arrives, the function might be executed having the message as parameter.

After fidling with pyee, I’ve implemented an EventEmitter on top of it. The syntax is the same as node.js, but it’s a distributed event emitter.