Wednesday, September 25, 2013

Glisten, a Groovy way to use Amazon's Simple Workflow Service

by Clay McCoy

While adding a new automated deployment feature to Asgard we realized that our current in-memory task system was not sufficient for the new demands. There would now be tasks that would be measured in hours or days rather than minutes and that work needed to be resilient to the failure of a single Asgard instance. We also wanted better asynchronous task coordination and the ability to distribute these tasks among a fleet of Asgard instances.

Amazon's Simple Workflow Service

Amazon's Simple Workflow Service (SWF) is a task based API for building highly scalable and resilient applications. With SWF the progress of your tasks is persisted by AWS while all the actual work is still done on your own servers. Your services poll for decision tasks and activity tasks. Decision tasks simply determine what to do next (start an activity, start a timer...) based on the workflow progress so far. This is high level logic that orchestrates your activities and should execute very quickly. Activity tasks are where real processing is performed (calculations, contacting remote services, I/O...). SWF was exactly what we were looking for in a distributed task system, but we quickly realized that it can be arduous writing a workflow against the base SWF API. It is up to you to do a lot of low level operations and your actual application logic can get lost in the mix.

Amazon's Flow Framework

Amazon anticipated our predicament and provided the Flow Framework which is a higher level API on top of SWF. It minimizes SWF based boilerplate code and makes your workflow look more like ordinary Java code. It also provides a lot of useful SWF infrastructure for registering SWF objects, polling for tasks, analyzing workflow history, and responding with decisions. Flow enforces a programming model where you implement your own interfaces for workflows and activities.


The interfaces contain special Flow annotations that identify their roles and allow specification of versions, timeouts, and more.
@Workflow
@WorkflowRegistrationOptions(
    defaultExecutionStartToCloseTimeoutSeconds = 60L)
interface TestWorkflow {
@Execute(version = '1.0')
    void doIt()
}
@Activities(version = '1.0')
@ActivityRegistrationOptions(
    defaultTaskScheduleToStartTimeoutSeconds = -1L,
    defaultTaskStartToCloseTimeoutSeconds = 300L)
interface TestActivities {
    String doSomething()
    void consumeSomething(String thing)
}
Flow generates code to make your activities asynchronous. Promises will need to wrap your activity method return values and parameters. Rather than the TestActivities above, you will program against the generated TestActivitiesClient below.
interface TestActivitiesClient {
    Promise<String> doSomething()
    void consumeSomething(Promise<String> thing)
}
The workflow implementation is your decider logic which gets replayed repeatedly until your workflow is complete. In your workflow implementation you can reference the generated activities client that was just described. Flow uses AspectJ and @Asynchronous annotations on methods to ensure that promises are ready before executing the method body that uses their results. In this example, 'doIt' is the entry point to the workflow due to the @Execute annotation on the interface above. First we 'doSomething' and wait on the result before we send it to 'consumeSomething'.
class TestWorkflowImpl implements TestWorkflow {
    private final TestActivitiesClient client = new TestActivitiesClientImpl();

    void doIt() {
        Promise<String> result = client.doSomething()
  waitForSomething(result)
    }

    @Asynchronous
    void waitForSomething(Promise<String> something) {
        client.consumeSomething(something)
    }
}
Flow clearly offers a lot of help in easing the use of SWF. Unfortunately its dependence on AspectJ and code generation kept us from using it as is. Asgard is a Groovy and Grails application that already has enough byte code manipulation and runtime magic. Since Groovy itself is well suited to the job of hiding boilerplate code we began to wonder if we could use it to get what we wanted from SWF.

Netflix OSS Glisten

Glisten is an ease of use SWF library developed at Netflix. It still uses core Flow objects but does not require AspectJ or code generation. Glisten provides WorkflowOperations and ActivitiesOperations interfaces that can be used by your WorkflowImplementation and ActivitiesImplementation classes respectively. All of the SWF specifics are hidden behind these operation interfaces in specific SWF implementations. There are also local implementations that allow for easy unit testing of workflows and activities.
Let's take a look at what a Glisten based workflow implementation looks like. Without code generation or AspectJ we no longer have the use of generated clients or the @Asynchronous annotation. Instead we use WorkflowOperations to provide 'activites' and 'waitFor' in addition to many other workflow concerns. Note that the Groovy annotation @Delegate is used here to allow the WorkflowOperations' public methods to appear on TestWorkflowImpl itself just to clean up the code. Like in the Flow example above, the 'doSomething' activity is scheduled and then we 'waitFor' its result to be ready. Once ready, the closure is executed where the 'consumeSomething' activity is provided with an 'it' parameter. In Groovy you can use 'it' to refer to an implicit parameter that is made available to the closure. Here 'it' is the result of the Promise passed into 'waitFor'. This is a pretty dense example of how we are using Groovy to handle some of the syntactic sugar that we lost from Flow by removing AspectJ and code generation.
class TestWorkflowImpl implements TestWorkflow {
    @Delegate
    WorkflowOperations<TestActivities> workflowOperations = SwfWorkflowOperations.of(TestActivities)

    void doIt() {
        waitFor(activities.doSomething()) {
            activities.consumeSomething(it)
        }
    }
}
Glisten is a lightweight way to use SWF and only requires a dependency on Groovy. Most of your code can still be written in Java if you prefer. Glisten is currently used in Asgard to enable long lived deployment tasks. There is a comprehensive example workflow in the Glisten codebase and documented on the wiki. It demonstrates many SWF features (timers, parallel tasks, retries, error handling...) along with unit tests.

Glisten makes it easier for us to use Amazon's SWF, and maybe it can help you too. If you are interested in helping develop projects like this feel free to contribute or even join us at Netflix.