Vaurien, the Chaos TCP Proxy

Ever heard of the Chaos Monkey?

_images/monkey.png

It’s a project at Netflix to enhance the infrastructure tolerance. The Chaos Monkey will randomly shut down some servers or block some network connections, and the system is supposed to survive to these events. It’s a way to verify the high availability and tolerance of the system.

Besides a redundant infrastructure, if you think about reliability at the level of your web applications there are many questions that often remain unanswered:

  • What happens if the MYSQL server is restarted? Are your connectors able to survive this event and continue to work properly afterwards?
  • Is your web application still working in degraded mode when Membase is down?
  • Are you sending back the right 503s when postgresql times out ?

Of course you can – and should – try out all these scenarios on stage while your application is getting a realistic load.

But testing these scenarios while you are building your code is also a good practice, and having automated functional tests for this is preferable.

That’s where Vaurien is useful.

Vaurien is basically a Chaos Monkey for your TCP connections. Vaurien acts as a proxy between your application and any backend.

You can use it in your functional tests or even on a real deployment through the command-line.

Installing Vaurien

You can install Vaurien directly from PyPI. The best way to do so is via pip:

$ pip install vaurien

Design

Vaurien is a TCP proxy that simply reads data sent to it and pass it to a backend, and vice-versa.

It has built-in protocols: TCP, HTTP, Redis, SMTP, MySQL & Memcache. The TCP protocol is the default one and just sucks data on both sides and pass it along.

Having higher-level protocols is mandatory in some cases, when Vaurien needs to read a specific amount of data in the sockets, or when you need to be aware of the kind of response you’re waiting for, and so on.

Vaurien also has behaviors. A behavior is a class that’s going to be invoked everytime Vaurien proxies a request. That’s how you can impact the behavior of the proxy. For instance, adding a delay or degrading the response can be implemented in a behavior.

Both protocols and behaviors are plugins, allowing you to extend Vaurien by adding new ones.

Last (but not least), Vaurien provides a couple of APIs you can use to change the behavior of the proxy live. That’s handy when you are doing functional tests against your server: you can for instance start to add big delays and see how your web application reacts.

Using Vaurien from the command-line

Vaurien is a command-line tool.

Let’s say you want to add a delay for 20% of the HTTP requests made on google.com:

$ vaurien --protocol http --proxy localhost:8000 --backend google.com:80 \
        --behavior 20:delay

With this set up, Vaurien will stream all the traffic to google.com by using the http protocol, and will add delays 20% of the time.

You can find a description of all built-in protocols here: Protocols.

You can pass options to the behavior using –behavior-NAME-OPTION options:

$ vaurien --protocol http --proxy localhost:8000 --backend google.com:80 \
    --behavior 20:delay \
    --behavior-delay-sleep 2

Passing all options through the command-line can be tedious, so you can also create an ini file for this:

[vaurien]
backend = google.com:80
proxy = localhost:8000
protocol = http
behavior = 20:delay

[behavior:delay]
sleep = 2

You can find a description of all built-in behaviors here: Behaviors.

You can also find some usage examples here: Examples.

Controlling Vaurien live

Vaurien provides an HTTP server with an API, which can be used to control the proxy and change its behavior on the fly.

To activate it, use the –http option:

$ vaurien --http

By default the server runs on locahost:8080 but you can change it with the –http-host and –http-port options.

See APIs for a full list of APIs.

Controlling Vaurien from your code

If you want to run and drive a Vaurien proxy from your code, the project provides a few helpers for this.

For example, if you want to write a test your backend service that runs on host:port using Vaurien proxy, you can write:

import unittest
from vaurien.util import start_proxy
from vaurien.util import stop_proxy
from vaurienclient import Client


class MyTest(unittest.TestCase):

    def setUp(self):
        # by default the HTTP service used for controlling vaurien
        # runs on localhost:8080, can be made to run on a different
        # host and port by using `http_host` and `http_port` as
        # argument to start_proxy.
        # by default the proxy is bound to localhost:8000, can be bound
        # to on a different host and port by using `proxy_host` and
        # `proxy_port` as argument to start_proxy.

        self.proxy_pid = start_proxy(
            backend_host=host, # host where your backend service runs
            backend_port=port, # port where your backend service runs
            protocol='http' # :ref:`protocols`
        )

    def tearDown(self):
        stop_proxy(self.proxy_pid)

    def test_one(self):
        # client that connects to the HTTP server which controls vaurien
        client = Client(host='localhost', port=8080)

        with client.with_behavior('error', **options):
            # do something...
            pass

        # we're back to normal here

In this test, the proxy is started and stopped before and after the test, and the Client class will let you drive its behavior.

Within the with block, the proxy will error out any call by using the errors behavior, so you can verify that your application is behaving as expected when it happens.

Extending Vaurien

Vaurien comes with a handful of useful Behaviors and Protocols, but you can create your own ones and plug them in a configuration file.

In fact, that’s the best way to create realistic issues: imagine you have a very specific type of error on your LDAP server everytime your infrastructure is under heavy load. You can reproduce this issue in your behavior and make sure your web application behaves as it should.

Creating new behaviors and protocols is done by implementing classes with specific signatures.

For example if you want to create a “super” behavior, you just have to write a class with two special methods: on_before_handle and on_after_handle.

Once the class is ready, you can register it with Behavior.register:

from vaurien.behaviors import Behavior

class MySuperBehavior(object):

    name = 'super'
    options = {}

    def on_before_handle(self, protocol, source, dest, to_backend):
        # do something here
        return True

     def on_after_handle(self, protocol, source, dest, to_backend):
        # do something else
        return True

Behavior.register(MySuperBehavior)

You will find a full tutorial in Extending Vaurien.

Contribute

The code repository & bug tracker are located at https://github.com/mozilla-services/vaurien

Don’t hesitate to send us pull requests or open issues!

More documentation

And there is more! Have a look at the other sections of the documentation:

Behaviors

Vaurien provides a collections of behaviors, all of them are listed on this page. You can also write your own behaviors if you need. Have a look at Extending Vaurien to learn more.

abort

Simulate an aborted connection by a client before receiving a response.

blackout

Immediately closes client socket, no other actions taken.

delay

Adds a delay before or after the backend is called.

The delay can happen after or before the backend is called.

Options:

  • before: If True adds before the backend is called. Otherwise after (bool, default: True)
  • sleep: Delay in seconds (float) (float, default: 1)

dummy

Transparent behavior. Nothing’s done.

error

Reads the packets that have been sent then send back “errors”.

Used in cunjunction with the HTTP Procotol, it will randomly send back a 501, 502 or 503.

For other protocols, it returns random data.

The inject option can be used to inject data within valid data received from the backend. The Warmup option can be used to deactivate the random data injection for a number of calls. This is useful if you need the communication to settle in some speficic protocols before the random data is injected.

The inject option is deactivated when the http protocol is used.

Options:

  • inject: Inject errors inside valid data (bool, default: False)
  • warmup: Number of calls before erroring out (int, default: 0)

hang

Reads the packets that have been sent then hangs.

Acts like a pdb.set_trace() you’d forgot in your code ;)

transient

No documentation. Boooo!

Options:

  • agitate: Number of calls before succeeding (int, default: 1)
  • inject: Inject errors inside valid data (bool, default: False)
  • warmup: Number of calls before erroring out (int, default: 0)

Protocols

Vaurien provides a collections of protocols, which are all listed on this page. You can also write your own protocols if you need. Have a look at Extending Vaurien to learn more.

http

HTTP protocol.

Options:

  • buffer: Buffer size (int, default: 8124)
  • keep_alive: Keep the connection alive (bool, default: False)
  • overwrite_host_header: If True, the HTTP Host header will be rewritten with backend address. (bool, default: False)
  • reuse_socket: If True, the socket is reused. (bool, default: False)

memcache

Memcache protocol.

Options:

  • buffer: Buffer size (int, default: 8124)
  • keep_alive: Keep the connection alive (bool, default: False)
  • reuse_socket: If True, the socket is reused. (bool, default: False)

mysql

No documentation. Boooo!

Options:

  • buffer: Buffer size (int, default: 8124)
  • reuse_socket: If True, the socket is reused. (bool, default: False)

redis

Redis protocol.

Options:

  • buffer: Buffer size (int, default: 8124)
  • keep_alive: Keep the connection alive (bool, default: False)
  • reuse_socket: If True, the socket is reused. (bool, default: False)

smtp

SMTP Protocol.

Options:

  • buffer: Buffer size (int, default: 8124)
  • reuse_socket: If True, the socket is reused. (bool, default: False)

tcp

TCP handler.

Options:

  • buffer: Buffer size (int, default: 8124)
  • keep_alive: Keep the connection alive (bool, default: False)
  • reuse_socket: If True, the socket is reused. (bool, default: False)

APIs

You can control vaurien from its APIs. There is a REST API and a command-line API

The REST API

GET /behavior

Returns the current behavior in use, as a json object.

Example:

$ curl -XGET http://localhost:8080/behavior
{
  "behavior": "dummy"
}

PUT /behavior

Set the behavior. The behavior must be provided in a JSON object, in the body of the request, with a name key for the behavior name, and any option to pass to the behavior class.

Note

Don’t forget to set the “application/json” Content-Type header when doing your calls.

Example:

$ curl -XPUT -d '{"sleep": 2, "name": "delay"}' http://localhost:8080/behavior \
       -H "Content-Type: application/json"
 {
   "status": "ok"
 }

GET /behaviors

Returns a list of behaviors that are possible to use

Example:

$ curl -XGET http://localhost:8080/behaviors
{
"behaviors": [
    "blackout",
    "delay",
    "dummy",
    "error",
    "hang"
]
}

If you want to control vaurien from the command-line, you can do so by using vaurienclient. vaurienctl –help will provide you some help.

Extending Vaurien

You can extend Vaurien by writing new protocols or new behaviors.

Writing Protocols

Writing a new protocol is done by creating a class that inherits from the vaurien.protocols.base.BaseProtocol class.

The class needs to provide three elements:

  • a name class attribute, the protocol will be known under that name.
  • an optional options class attribute - a mapping containing options for the protocol. Each option value is composed of a description, a type and a default value. The mapping is wired in the command-line when you run vaurien - and is also used to generate the protocol documentation.
  • a _handle method, that will be called everytime some data is ready to be read on the proxy socket or on the backend socket.

The vaurien.protocols.base.BaseProtocol class also provides a few helpers to work with the sockets:

  • _get_data: a method to read data in a socket. Catches EWOULDBLOCK and EAGAIN errors and loops until they happen.
  • option: a method to get the value of an option

Example:

class TCP(BaseProtocol):
    name = 'tcp'
    options = {'reuse_socket': ("If True, the socket is reused.",
                                bool, False),
               'buffer': ("Buffer size", int, 8124),
               'keep_alive': ("Keep the connection alive", bool, False)}

    def _handle(self, source, dest, to_backend):
        # default TCP behavior
        data = self._get_data(source)
        if data:
            dest.sendall(data)
            if not self.option('keep_alive'):
                data = ''
                while True:
                    data = self._get_data(dest)
                    if data == '':
                        break
                    source.sendall(data)

                if not self.option('reuse_socket'):
                    dest.close()
                    dest._closed = True
                return False
        return data != ''

Once the protocol class is ready, it can be registered via the Protocol class:

from vaurien.protocols import Protocol
Protocol.register(TPC)

Writing Behaviors

Creating new behaviors is very similar to creating protocols.

XXX

Using your protocols and behaviors

XXX

Examples

Proxying on an HTTP backend and sending back 50x errors 20% of the time:

$ vaurien --protocol http --proxy 0.0.0.0:8888 --backend blog.ziade.org:80 \
          --behavior 20:error

And you can also simulate 50x errors 20% of the time to all responses:

$ vaurien --protocol http --proxy 0.0.0.0:8888 --backend 0.0.0.0:80 \
          --behavior 20:error

An SSL SMTP proxy with a 5% error rate and 10% delays:

$ vaurien --proxy 0.0.0.0:6565 --backend mail.example.com:465 \
          --protocol smtp --behavior 5:error,10:delay

An SSL SMTP Proxy that starts to error out after 12 calls (so in the middle of the transaction):

$ vaurien --proxy 0.0.0.0:6565 --backend mail.example.com:465 \
          --protocol smtp --behavior 100:error --behavior-error-warmup 12

Adding a 1 second delay on every call to a MySQL server:

$ vaurien --proxy 0.0.0.0:3307 --backend 0.0.0.0:3306 --stay-connected --behavior 100:delay \
          --behavior-delay-sleep 1

A quick’n’dirty SSH tunnel from your box to another box:

$ vaurien --stay-connected --proxy 0.0.0.0:8887 --backend 192.168.1.276:22 \
    --protocol-tcp-keep-alive