Red Hat
Oct 6, 2011
by Ben Browning

If you haven't heard by now, TorqueBox 2.x is powered by JBoss AS7 which claims to be blazingly fast and lightweight. So, naturally, we want to put those claims to the test and see how TorqueBox 2.x stacks up against the competition.

Building on what we've learned from previous benchmarks (round 1, round 2), this latest round of benchmarking compares the performance of Spree running under:

Even if you're not a fan of JRuby, stick around to see how Ruby 1.9.2 compares to REE. From round 2 we know REE outperforms Ruby 1.8.7 but how does it compare to 1.9.2?

Why Spree?

Spree is a well-known Rails 3 application that can run under Ruby 1.8, 1.9, and JRuby. Based on feedback from our Redmine benchmarks, we wanted to make sure the next application could run under Ruby 1.9 for an accurate comparison of JRuby vs C Ruby performance.

The Setup

Spree is nice enough to ship with a set of sample data that we used for benchmarking. The benchmark script simulates users browsing around a few pages of the site, starting with a small number of concurrent users and gradually increasing until it finishes after 80 minutes.

More details about the benchmark and links to the raw results are at the bottom of the post.

The Results

Top Servers

Top Servers

Ignoring the latency graph for a minute, it's obvious that the runtime (JRuby vs Ruby 1.9.2) is the differentiator in throughput, CPU usage, and free memory. TorqueBox and Trinidad have no appreciable difference in these categories but both clearly outperform Passenger and Unicorn. If you're concerned with maximizing throughput, minimizing CPU usage, or minimizing memory usage under load then you can't go wrong choosing either JRuby server.

However, what about the latency graph? This is a graph of the average time taken for each request - in other words the average time a user would have to wait for a page on the site to load. This is where the difference in web servers, not runtimes, is readily apparent.

At peak load, TorqueBox has a lower latency than the nearest competitor, Passenger, by a factor of 8 and beats out Trinidad by a factor of 32. Note that the latency graph's y-axis has a logarithmic scale. To help illustrate this point, here's the same latency graph with a linear y-axis and Unicorn removed because its latency is so bad at the end of the test.

Latency

So, in a common real-world scenario, let's assume our application has a requirement that it must have an average response time of 1 second. How many requests per second can each server handle while staying under this 1 second mark? Looking at the latency and throughput graphs, we see that Trinidad can handle 45 requests per second, Passenger 90 requests per second, Unicorn 100 requests per second, and TorqueBox 130 requests per second. At peak load of 130 requests per seconds the average response time from TorqueBox is only 256ms, well under our 1 second requirement.

If you were still skeptical about the performance benefits of switching to JRuby, the above graphs should be convincing enough to give it a shot.

TorqueBox 2.x vs TorqueBox 1.1.1

TorqueBox 2.x vs 1.1.1

We've seen how TorqueBox 2.x stacks up against the competition, but how does it compare to the latest 1.x stable release, TorqueBox 1.1.1? Thanks in a large part to AS7, TorqueBox 2.x has lower latency, higher peak throughput, less CPU usage, and less memory usage than TorqueBox 1.1.1.

REE vs Ruby 1.9.2

REE vs 1.9.2

Ruby 1.9.2 gives Passenger and Unicorn lower latency, higher throughput, lower CPU usage, and more free memory than REE. From a performance standpoint there's no reason why you shouldn't be using 1.9.2 if you must use a C Ruby.

The Details

All benchmarks were run on Amazon EC2 using an m1.large Tsung client instance, a c1.xlarge server instance, and a db.m1.large MySQL database instance. All instances were started in the same availabilty zone and every benchmark started from a clean database loaded with Spree's sample data. Each benchmark run was performed twice on separate days and the best of the two runs used for the graphs.

TorqueBox and Trinidad were set to use a 2GB heap and a maximum of 100 HTTP threads to match the database connection pool size. Unicorn and Passenger were both started with 50 workers. From testing, 50 was the sweet spot to get maximum throughput and anything over just increased CPU usage and memory usage without any further increase in throughput.

The Spree application used for benchmarking and all raw results are available in our GitHub repository.

Tsung Reports and Raw Results

If you'd prefer to take the raw data and analyze it yourself, the Tsung-generated reports and raw Tsung results are available for each server below.

TorqueBox

Tsung Report Raw Results

Trinidad

Tsung Report Raw Results

Passenger w/ REE

Tsung Report Raw Results

Passenger w/ 1.9.2

Tsung Report Raw Results

Unicorn w/ REE

Tsung Report Raw Results

Unicorn w/ 1.9.2

Tsung Report Raw Results

Questions? Comments? Leave us a comment on this post or get in touch via mailing lists, IRC, or twitter.