Engulf

Engulf is a scalable and distributed HTTP benchmarker, designed to let you spin up and coordinate a cluster of workers with nothing more than a single JAR. Engulf's backend is written in clojure, the frontend in javascript. Engulf is fully open-source.

Usage

Quickstart

Client / Server Configuration

Client/server configuration of Engulf is simple. One master node is started which will listen on port 4025. Then, any number of worker nodes maybe started that point to the master. See the example below:

Starting a worker
java -server -jar engulf.jar --mode worker --connect-to 127.0.0.1:4025

CLI Opts

Backing up Saved Data

All data is saved to $HOME/.engulf.sqlite3 on the master node. If you wish to back this up be sure to stop any running instances and make a copy of this file.

Tuning OS Settings

Engulf opens a lot of ports very quickly, most OSes have default limits that conflict with that, especially when keep-alive is disabled. To combat this, consider upping your system's max open files via ulimit, and increasing its ephemeral port range.

Warming Up Engulf

Engulf is fastest after a few short test runs. You will notice gets faster after a boot given a few test cases. This is due to the fact that Engulf is written in Clojure which runs on the Java Virtual Machine. The HotSpot JIT recompiles code as it runs, making code progressively faster leading to this phenomenon.

Instant AWS Cluster

About Cloud Load Testing with AWS

Perhaps the easiest way to start up Engulf is with Amazon Cloudformation. The CloudFormation templates will setup either a single instance, or cluster of instances with just a few clicks, along with any security and scaling groups as required. These instances are based on a custom AMI with tuned operating system and JVM settings for optimal performance.

Launching a Single Instance

Click here to launch a single instance of Engulf. Click 'continue' on the first screen, then simply fill in the size of instance when prompted. Be sure to use the API name for a given instance type from this list.

When the CloudFormation template is done running, check its output pane on the Amazon Console. There should be a field labeled "URL", with a corresponding URL to its right. Open the URL in your browser to use the web interface.

Please note, all data is saved on the EBS node of your EC2 instance. If you wish to save that data for later make sure to never terminate the instance, but rather to only stop and start it. For other options Read the section on backups.

Launching a Cluster

Click here to launch an Engulf cluster using CloudFormation. Click continue on the first screen. The number of desired workers can be specified up-front, as can the instance sizes of the workers and masters. Please use an instance type from this list. Workers can be scaled up and down by re-running the CloudFormation template (just click the link above a second time and alter the number of nodes).

You may not want to keep the full cluster running at all times. If you do not mind losing data simply delete the CloudFormation stack. If you would like to keep your data around the best strategy is to, in the EC2 console, 'stop' the master node and 'terminate' the workers. This leaves your data intact on the master, and discards the workers which have no data. To start it back up again, start the master and wait for it to fully boot. Then, visit the cloud formation home, select 'EngulfCluster', click 'Update Stack', select 'Provide a Template URL', and paste in http://engulf-project.s3.amazonaws.com/engulf-cluster.template.json'

About the AMI

If you would like to build your own custom cluster on AWS feel free to make use of the custom AMI (ami-5f2a9f36) used by the CloudFormation recipes.

The pre-built Engulf AMI is configured with all the correct dependencies, will always use the latest version of Engulf, uses tuned kernel settings, and uses tuned JVM settings for optimal performance. Options for Engulf can be passed through the user-data option when booting.

Launching a Single Instance

Launching a Cluster

Provisioning the Master

Provisioning the Workers

Using the Cluster

You should see a number of nodes matching the number of workers you've started up. If the number of nodes is indicated as 0 something is off!

SSHing in and Backing up Saved Data

All data in the Engulf VM is persisted to /home/ubuntu/.engulf.sqlite3 on the master node. You can ssh into the node using your provisioned key as the 'ubuntu' user.

Either make sure to keep the master's EBS Volume around, or make sure to backup this file yourself (you probably want to kill any active Engulf processes if manually backing up before doing this).

Engulf can be started/stopped by executing sudo /etc/init.d/engulf stop. It can be started again with sudo /etc/init.d/engulf start

REST / Streaming API

Job API

Each benchmark executed is represented by a single job, with multiple nested results in this REST API.

Starting a Job

POST /jobs/current + JSON Body

A new benchmark can be started by issuing a POST request to /jobs/current with a JSON body of parameters. The parameters for jobs are:

Parameters:

title: MAY be set. Stored as job metadata. Maxlen 255
notes: MAY be set. Stores as job metadata. Size unbounded
_stream: MAY be set. If set to true the HTTP connection will stay open while the benchmark is running, and stream results back in chunks. If set to false it will return the job metadata immediately.
params.url: MAY be set. The full URL to test, e.g., http://localhost/bar. It must ALWAYS include the host and protocol. If this is not set, markov-corpus must be set.
params.markov-corpus: MAY be set. If you'd like to test a list of URLs use this option.
The list is analyzed and traversed as a markov-chain for a good mix between randomness and similarity to a given source. Taking a snippet out of a webserver log is a good way to use this.
The should must passed in as a JSON array of either strings (simple get requests) or hashes of the form {"url": "http://...", "method": "post"}.
params.limit: MUST be set. The job will stop at this point. Note, due to performance optimizations there is no guarantee Engulf will stop at this exact mark. It will merely attempt to stop at it, but will usually over-run it by a small amount.
params.concurrency: MUST be set. The desired number of concurrent requests cluster-wide. So, if you have 2 workers, a setting of 10 means there will be 10 simultaneous requests at any given time, with 5 on each worker.
params.formula-name: MUST be set. Currently always "http-benchmark".
params.target: MUST be set. Is a dictionary describing the URL or URLs to hit with the following values.

Single URL Targets
```
{
  "type": "url",
  "url": "http://example.net",
  "method": "get",     // Can be get/post/put/patch/delete
  "timeout": 1000,     // Optional, defaults to 30000ms
  "keep-alive": "true" // required, either "true" or "false" as strings
}
```
Markov URL Targets See this blog post for more information on markov HTTP testing in general. Each dictionary in the corpus takes the same arguments as a single URL does. Per client opts (like keep-alive) are derived from the first URL in the list. It also supports an abbreviated syntax, where only the URL is used in lieu of a dictionary. A GET request is assumed in this case.
```
{
  "type": "markov",
  "corpus": [{"url": "http://example.net/a", "method": "get", "keep-alive": "true"}
             {"url": "http://example.net/b", "method": "get"},
             "http://example.net/simple"]
}
```

Examples:

# Testing a single URL
curl -XPOST http://localhost:4000/jobs/current -H 'Content-Type: application/json' -d '{
"title": "A Simple Test", "_stream":"true",
"params": {"formula-name":"http-benchmark", "concurrency":5, "limit":50000,
    "target": {"type": "url", "url": "http://localhost:8081", "keep-alive":"true", "timeout":50, "method":"get"}}}'

# An example of testing using a markov-chain
curl -XPOST http://localhost:4000/jobs/current -H 'Content-Type: application/json' -d @markov.json

In markov.json:

{
    "title": "a test",
    "_stream": "true",
    "params": {
      "formula-name": "http-benchmark",
      "concurrency": 5,
      "timeout": 50,
      "limit": 50000,
      "keep-alive": "true",
      "target": {
        "type": "markov",
        "corpus": [
            "http://localhost:8081/foo",
            "http://localhost:8081/bar",
            "http://localhost:8081/bar",
            "http://localhost:8081/baz",
            {
                "method": "POST",
                "url": "http://localhost:8081/fancy"
            },
            "http://localhost:8081/foo",
            "http://localhost:8081/baz",
            {
                "method": "POST",
                "url": "http://localhost:8081/fancy"
            },
            "http://localhost:8081/foo",
            "http://localhost:8081/foo",
            {
                "method": "POST",
                "url": "http://localhost:8081/fancy"
            }
        ]
    }
  }
}

Stopping the Current Job

DELETE /jobs/current

Stops the currently running benchmark instantly. Returns a representation of the job.

Example: curl -XDELETE http://localhost:4000/jobs/current

Listing Jobs

GET /jobs

Returns a paginated list of jobs. This will only return the job metadata. To retrieve results, view the job-get API below.

Parameters:

page: Starting from 1, the page number to retrieve.
per-page: The number of results per-page

Example: curl http://localhost:4000/jobs?page=1&per-page=10

Retrieving a Job

GET /jobs/UUID

A single job can be retrieving at /jobs/UUID-HERE. It will include all the results nested inside of it.

Example: curl http://localhost:4000/jobs/ef3062e1-7abc-4769-a96c-a654c4219f5c

Deleting a Job

DELETE /jobs/UUID

This will attempt to delete the job specified by uuid at /jobs/UUID-HERE. Jobs that are currently still running cannot be deleted, and will return an HTTP status of 409 - Conflict.

Example: curl -XDELETE http://localhost:4000/jobs/ef3062e1-7abc-4769-a96c-a654c4219f5c

River API

Understanding the River API

GET /river supports websockets

The river API consists of a single endpoint /river that understands both plain GET and WebSocket requests. Either way, it returns a stream of JSON messages representing state-changes and results within Engulf.

Most messages sent through the River API utilize the EAV Pattern, and have a format that consists a map of the format: {"entity", "msg-entity" "name": "msg-name", "body", "msg-body"}.

It should be noted that the River API consists only of deltas of nodes and job state changes. The initial request to /river will return messages named "current-nodes" and "current-jobs" respectively. These are a snapshot of the current state of both resources. Messages received subsequently should alter the local state of clients accessing these resources.

More documentation about the River API is on the way, but a reasonable understanding can be gleaned by watching the output of curl /river.

Node API

Information about connected worker nodes can be retrieved via this REST API.

Listing Nodes

GET /nodes

Returns a list of all connected nodes.

Example: curl http://localhost:4000/nodes

Retrieving a Node

GET /nodes/UUID

Returns the metadata for a single connected node.

Example: curl http://localhost:4000/nodes/a-uuid-here

Todo:

Thanks!

I'd like to thank Zach Tellman, who's work on the fantastic aleph library made much of Engulf possible. I'd also like to thank Trent Strong for the idea of generating requests with Markov chains.

I'd like to thank YourKit for providing this project with their Java profiler (which works excellently with Clojure). It's great at spotting performance issues. More info below:

YourKit is kindly supporting open source projects with its full-featured Java Profiler. YourKit, LLC is the creator of innovative and intelligent tools for profiling Java and .NET applications. Take a look at YourKit's leading software products: YourKit Java Profiler and YourKit .NET Profiler

Learn More

About

Usage

Quickstart

Client / Server Configuration

CLI Opts

Backing up Saved Data

Tuning OS Settings

Warming Up Engulf

Instant AWS Cluster

About Cloud Load Testing with AWS

Launching a Single Instance

Launching a Cluster

About the AMI

Launching a Single Instance

Launching a Cluster

Provisioning the Master

Provisioning the Workers

Using the Cluster

SSHing in and Backing up Saved Data

REST / Streaming API

Jobs

River

Nodes

Job API

Starting a Job

Stopping the Current Job

Listing Jobs

Retrieving a Job

Deleting a Job

River API

Understanding the River API

Node API

Listing Nodes

Retrieving a Node

Todo:

Thanks!

License