FEATURE: Extreme Speed
Wallaroo has been developed to provide correct results quickly. What does quickly mean? Wallaroo processes millions of events on a single machine, with median latencies that are measured in microseconds and tail latencies that are measured in single-digit milliseconds. Low latencies are all well and good, but what does that mean in practice?
Performance is highly dependent on the type of data, the computations being performed, etc. That said, we want to give an idea of what the performance of a Wallaroo application.
A synthetic benchmark
An application that we use to benchmark Wallaroo performance is “Market Spread.” You can learn more about it in our Hello Wallaroo!
Let’s talk about the Market Spread application and some recent performance numbers we’ve gotten with it. First, though, let’s highlight some key points:
- The logic is reasonably straightforward.
- The state it stores in memory remains constant in size.
- It has two streams of data, only one of which occasionally produces output that results in network traffic (1 out of 70,000).
- Messages are 50 to 60 bytes in size.
Market Spread Python
During a recent performance testing run using a single-threaded Python process on an AWS m4 class machine, we were able to run each stream of data at around 43k messages per second for a total of 86k messages per second across both streams. Our processing latencies were:
|<=||.5 ms||1 ms||2 ms||2 ms||4 ms|
86k messages a second with 9,999 out of 10,000 processed in 4 milliseconds or less is pretty impressive performance for a Python application.
In this case, Python is the performance bottleneck. Other language bindings would be able to achieve even better performance. When running single-threaded on the same machine, the Pony version of Market Spread can do 120k messages per second per stream with a total of 240k messages per second across both streams. What about latencies?
|<=||260µs||1 ms||1 ms||2 ms||2 ms|
Not too bad. And to give you an idea of what Wallaroo can do when we really open it up…
When running the same Pony Market Spread application using 16 cores we were able to run each stream of data at around 1.5 million messages per second (for a total of 3 million messages per second across both streams). Our processing latencies were:
|<=||66µs||.5 ms||.5 ms||.5 ms||1 ms|
Even with the simple application caveats that we laid out, we think that is some pretty impressive performance – performance that we are committed to maintaining as we add functionality new to Wallaroo.
What’s the secret sauce?
Wallaroo’s performance comes from a combination of design choices and constant vigilance. Wallaroo uses an actor-model approach that encapsulates data, minimizes coordination, and brings state close to the computation. We test every feature for performance. We reimplement functionality when we find performance lacking. We have a blog post What’s the secret sauce?.