Wallaroo Community

Thank you for supporting our project!

Core Concepts

Wallaroo Core Concepts

A Wallaroo application consists of one or more pipelines. A pipeline takes in data from an external system, performs a series of computations based on that data, and optionally produces outputs which are sent to an external system.

Here is a diagram illustrating a single, linear pipeline: Simple linear pipeline

An external data source sends data (say, over TCP) to an internal Wallaroo source. The Wallaroo source decodes that stream of bytes, transforming it into a stream of internal data types that are sent to a series of computations (C1, C2, and C3). Each computation takes an input and produces an output. Finally, C3 sends its output to a Wallaroo sink, which encodes that output as a series of bytes and sends it over TCP to an external system.

A Wallaroo application can have multiple interacting pipelines. For example, an app could have one pipeline that takes data and updates state based on that data, and a second pipeline that takes data, does computations against the current state in the system, and produces output based on the state and data. The first of these has no sink, whereas the second does.

Core Wallaroo abstractions

How does one go about building a Wallaroo application? Via our developer framework and its APIs which are the focus of this guide. The core abstractions from our API that we'll touch on in this introduction are:

  • Computation
  • Pipeline
  • Source
  • Sink

The most important of these is a Computation. Computations come in two varieties: stateless and stateful. A stateless computation takes some data as an input and creates some new data as an output. For example, a “double computation” might take in an integer such as 2 and output 4. A stateful computation is similar to a stateless computation except it takes an additional input: the state it will operate on. An example of a stateful computation would be a counter that keeps track of the running total of all numbers it has processed.

You can combine computations together using another abstraction we provide: Pipeline. A pipeline allows you to say, for example, that the output from computation A will be processed by computation B. A pipeline begins with a Source step, which is responsible for receiving and decoding incoming external messages. Likewise, the pipeline may end at a Sink, if it has anything to output, which encodes data and sends it to an external receiver. In this way, you can take individual computations and start turning them into applications that take in data from various external sources and ultimately produce outputs that are sent to external systems via sinks.

Concepts

  • State: Accumulated result of data stored over the course of time
  • Computation: Code that transforms an input of some type In to an output of some type Out (or optionally None if the input should be filtered out).
  • State Computation: Code that takes an input type In and a state object of some type State, operates on that input and state (possibly making state updates), and optionally producing an output of some type Out.
  • Source: Input point for data from external systems into an application.
  • Sink: Output point from an application to external systems.
  • Decoder: Code that transforms a stream of bytes from an external system into a series of application input types.
  • Encoder: Code that transforms an application output type into bytes for sending to an external system.
  • Pipeline: A sequence of computations and/or state computations originating from a source and optionally terminating in a sink.
  • Application: A collection of pipelines.
  • Topology: A graph of how all sources, sinks, and computations are connected within an application.

API

Wallaroo provides APIs for implementing all of the above concepts.

Read more in Wallaroo API and Technical Documentation 

Answer our 30-second survey, and be entered to win a Wallaroo T-Shirt

About Us

Wallaroo makes the infrastructure virtually disappear so you get rapid deployment, very low operating cost, and elastic capacity with zero downtime for your applications in big data, stream processing, machine learning, and microservices.

Our Contacts

222 Broadway,
New York, NY 10038

(646) 801-3168