Pages

Thursday, February 28, 2013

High Availability Workflows

Since joining Avtex, I have been able to expand my horizons and gain exposure to customers with unique needs.  I try, as hard as possible, to incorporate or build on top of CRM’s out-of-box experience, and refrain from writing code I don’t have to.  To that end, I’d like to share a simple solution to making Workflows trigger while they’re deactivated for updating.

It’s no secret that business processes change on-the-fly.  Implementing changes to active Workflows can be tricky, from an availability standpoint.  Most companies adopt a routine of modifying Workflow designs afterhours, or with operations momentarily held until the modification is complete.  This presents a dynamic and potentially troubling hurdle for “always on” companies.

Because Workflows are listeners to CRM operations, rather than direct participants, any downtime with a particular Workflow means that it’s no longer listening to events.  This allows for the potential of unapplied business logic, and can be very difficult to diagnose or troubleshoot.

Though the space of downtime can be reduced to mere minutes—by developing in an alternate environment and shipping the updated Workflow in as a Solution—the window of opportunity for actions to tip-toe past a disabled Workflow still exists.  For some companies, this is simply unacceptable.

However, you can use the out-of-box Workflow abilities to create high availability Workflows that can be taken offline, modified, and then reactivated, all without missing a single event that was triggered while the Workflow was offline.  This works by splitting the Workflow’s functionality into two separate Worfklows:

  1. An event listening “Dispatcher” Workflow; and
  2. A “Business Logic” application Workflow

By isolating the business logic into a “child Workflow” which is called by its corresponding Dispatcher, one can take the Business Logic offline, while leaving the Dispatcher functional.  This allows the configured triggers of the Dispatcher to operate continuously, though the step which calls the Business Logic counterpart will fail during the downtime.

Though the Dispatcher jobs enter a “Waiting” state, they will be easy to identify (especially if you allow them to delete themselves when they’re successful) in order to resume.  This behavior is generally sufficient enough to allow a wider window for Business Logic adjustment, without requiring additional intervention to process the new logic against records that are awaiting to execute the new logic.  That brings up another excellent advantage to this pattern:

With Dispatchers, you can immediately terminate existing logic and immediately register all further processing against future logic.

Note:  You cannot retarget a different workflow, as that would require taking the Dispatcher offline—which defeats the purpose.  The System Job acts as a cloned instance of a Workflow, so the Dispatcher will always target a specific, business-logic Workflow.  You can approximate a retargeting scenario with Dispatcher juggling, but it would involve trigger overlapping mitigation.

Here’s an example scenario that uses a Dispatcher to update an Account using the Dispatcher and Business Logic pattern:

First, create the Business Logic workflow, and for “Available to Run” select “As a child process”.  Remove all selections from “Options for Automatic Processes”.

image

Then, create the Dispatcher workflow with the “Options for Automatic Processes” setting you desire, and configure it to call your Business Logic workflow.

image

You may now activate both.  Your Dispatcher is diligently watching the events, and the Business Logic is processing your rules.

Here is what happens when you deactivate the Business Logic workflow to make modifications:

image

My example uses a Dispatcher that listens to Account creation, so when I create a new account, here is what I see in the “Workflows” associated to it:

image

As you can see, the Dispatcher caught the event, and then entered a “Waiting” state.  If we examine the job, we can see the error:

image

It failed on the step that calls my Business Logic.  This job will remain in this state until I resume it.  After completing my modifications to Business Logic, I’ll reactivate it.  Then, I need to identify all my outstanding Dispatcher jobs:

image

Then, resume them with the confidence that I have missed no important triggers while my Business Logic was momentarily offline:

image

That said, I always perform a quick validation, just to be sure:

image