BusinessEngine — StateMachine implementation

11 min readApr 23, 2021

Let me share my thoughts about using a state machine as a tool in defining business processes. I will also present it from the point of view of state machine implementation.

To follow the text you should be already familiar with the finite state machine concept. If not then I recommend reading about it. Knowing the concept and being able to create / read a state machine diagram is useful to present, analyze, define different process flows.

I will focus on the idea that a state machine can be at only one state at any time. There are different implementations that allow composite states or parallel executions. But in my opinion if a definition of a state machine is big, people may have trouble understanding the process and with time simply start to ignore checking the diagrams. I think that composite states or parallel executions may lead to such growing process definitions. An alternative is to define many small processes and organize / categorize them. This may also cause a danger of creating too many places to check.
After all the important thing is that it is understable by any person in the company, on any position.

I will use a room reservation process as an example for this article. It could be done differently, but I wanted to present many different concepts using one example. Do not focus on the flow itself, but on the possibilities a state machine gives in terms of designing a business process.

Basics:

Important thing to consider while implementing a state machine functionality is to focus on the basic idea and not on special features. The basic idea of this implementation is:

machine can be only in one state at a time
when the transition happens
which transition to follow (for states with multiple transitions)

The definition of the state machine in the code could be represented as an array (more details will come later). The flow represents a room reservation. We need to track every reservation made by a customer. So for every reservation we need to store in which state it is and we need a unique identifier for every running reservation process. Let’s make it clear.
The logic of moving inside the state machine has nothing to do with it’s storage. The logic should allow to set the state machine to any given state and how to move between states. The reason for this separation is to make the implementation of the state machine functionality not depend on any storage engine. Be aware that a complete solution requires connecting storage with usage of the state machine.
One solution could be to have a cronjob running over all your active process instances and checking / running possible transitions.

Actions:

On the diagram, actions are visible near transitions and are prefixed with ‘/’.
From a business perspective we expect to execute some actions at specific moments of the process eg. sending an email or creating a task for a customer service.
It is required to be able to specify when the action should run: while entering the state or / and while leaving it. If the action is specified on the transition, reaching a state means something already happened, it’s done. If the action happens while leaving the state, it means something is about to happen.
Your implementation can also handle both cases, but then there is a question — “Where should I define my action?”.
For this implementation I only allow defining the actions on transitions.
If something goes wrong, the transition will not be considered
successful and the state will not change.

If you think about it, the implementation for either case is very similar. First you need to identify the transition you are allowed to follow and then, grab the action name either from the state or the transition itself. In case you want to support executing actions while entering and while leaving the state, just make sure the actions are called in the right order. First the one defined on
the state, second the one defined on the transition.
This implementation also handles an action return value, for cases when it’s not able to be executed it will not move to the next state. Let’s say if API is not available or connection to the database is lost. In case you choose to support both; state on_enter and on_exit actions, consider what should be the behavior if one of the actions fails. Sometimes you need to make sure that the action is not executed more than once.

Another thing that sometimes comes up — “Why not to define multiple actions per state / transition ?”. Because if something fails you need to know what.
Of course it can be implemented, but then it requires integration with some kind of a storage mechanism and then on the diagram you should also present which actions did not execute. If there are multiple actions, success should be considered, if all of them are done, or any of them? Also order may need to be considered. Alternative to multiple actions is to define a chain of state transitions. Again, it’s a matter of making a decision based on the requirements and simply implementing it.

Something to think about: if you try to add all the functionality you may think of, you may discover that it is not needed, not used or misused. If you have something simple to start with, it should also be simple to modify and adjust to the needs. Very often I find it more difficult to modify existing behaviors (especially the over complicated ones), than to create new ones.

Conditions / Events:

Conditions are blocking transitions and are also responsible for pushing the process into the right state. If there are multiple outgoing paths, the one will be allowed for which the condition is fulfilled or the one which
has no condition.
This implementation also relies on the result of the action. As mentioned before actions are defined on transitions. If action will be executed with a failure or with a boolean false value, the state will not change,
eg. failing api call.
Now it’s important to mention. Conditions should not contain any logic that could affect the business process. Of course you can have timeout or retry like conditions for which you need to store some data, like a date for example.
But this is a write operation that does not affect the business logic.
Events can also be considered as conditions, after all they are blocking transitions from moving forward. The difference is, that events are triggering state machine changes outside of the loop or cronjob task that is continuously checking the process for possible changes. When designing the flow you also must consider that the event may occur only once and never again, so maybe it’s better to react on the happening outside of the state machine, persist the event, and use condition instead. The same applies if the action would fail, moving forward would not occur and the process could get stuck.
Another thing to consider is, should you define an event and a condition together on a single transition, or on different outgoing paths from the same state. In such cases the flow could be random and that may not be what you expect.

You always need to think about the flow and be a guardian of the process. Without additional mechanisms like logging the transitions or detecting these being stuck situations, it may be hard to find out what exactly is wrong with the flow. You do not want the customers to come to the hotel to find out the room has not been booked for them, or that the room is not available because it has not been released after cancellation.

Additional features:

This implementation also supports some of the functionality that is not defined by the state machine model:

global events
state metadata

Global events allow to react to a specific happening that is not defined by any of the current state transitions. For example, at any given time the customer may call you and request a room reservation cancelation. If the process allows this you should not need to create a transition for every state that reacts on the ‘cancel’ event. What you could do instead is handle this event by
defining a global transition. Sometimes it may be required to handle the cancellation process differently depending on the current state of the process. If the reservation has not been confirmed yet, you may not need to send an email to the customer about it’s cancellation. On the other hand if the process is cancelled after booking confirmation, you may need to take additional steps, like notifying the personell. So the implementation should first
check if there is no ‘cancel’ event defined going out from the current state and check the global ones only if there is none that can be followed.

State metadata is a static data that could be defined on any state. You can use it as part of other logic, like conditions or actions. For example the global “cancel” event will be effective only for states with “is_cancellable” flag.

Implementation Details

I have chosen YAML as the state machine definition format. I find it is appropriate for this case. It should be easy to read and maintain. Specific programming language syntax may take the focus away from the important parts.

YAML or JSON are also easily converted to an array in most of the languages and from there you can create DTOs or other data structures. When working on my implementation I have used an array, as it is easy to use, very often programming languages provide functions to read or manipulate them. Also I did not want to spend additional execution time for conversion between objects. After all I may be checking / executing hundreds or thousands of state machine instances in a single process execution. It just needs to be made sure that the keys are used only by a limited number of classes.

StateMachineRegistry holds a mapping between state machine id and yaml file path. With every call to it’s read(‘state_machine_id’) method it checks if it already contains parsed yaml data. If not, it reads the file from the mapping and holds the result for future read operations.

StateMachineReader is the class responsible for reading the definition. This is the single place that knows the array keys. Some of the public methods it contains:

readState(string $stateName): arrayactionExistsFor(array $transition): boolconditionExistsFor(array $transition): boolreadMetadataByStateByKey(string $stateName, string $key): string

Very basic read operations needed by the StateMachine class which needs this in order to implement the logic responsible to move between the states like:

getActiveStateName(): stringsetActiveStateName(string $activeStateName): selfmoveForward(): stringtriggerEvent(string $event): stringfindNextOpenTransition(): array

StateAction and StateCondition are just abstractions defining the interface and default behavior for State or Condition classes.

In the source code after parsing the YAML you can keep the definition in an array.

Using an array makes it really easy to implement re-usability of the definitions. It’s just a plain recursive array merge, done in the right order.
The benefit is that the files from which I am extending can contain any property. In the YAML example above the extended files contain only aliases for the most common used conditions.

You will also probably need to log the activity of the state machine.
For that I have implemented a simple StateMachineListener class. It’s injected to the constructor of the StateMachine class. It defines following methods:

triggeredEvent(StateMachine $stateMachine, string $eventName): voidexecutedCondition(StateMachine $stateMachine, string $conditionClass, array $arguments, bool $conditionResult): voidexecutedAction(StateMachine $stateMachine, string $actionClass, array $arguments, bool $actionResult): voidchangedState(StateMachine $stateMachine, string $previousState, string $currentState): void

Alternative could be to have a single method notify(StateMachine $sm, Dto $data), but I came to the conclusion that, I would then always use conditions to check if there was a triggering event, if action has been executed, if there was any condition. By having multiple methods I react only to the things I am interested in. Some of you may think it is not scalable, because if I add any new functionality to the StateMachine class I should also probably update the Listener class. But the whole thing is that the StateMachine logic should be simple and not require many modifications. All the necessary business related logic you should put into custom Action and Condition class. I would not even use the Listener for business logic, but rather for logging or just sending messages to other services, and let them handle the incoming notifications.

Versioning

If you change a process definition while your system is already running for some time, you may come to a conclusion that the changes you are about to do are so big that you want to keep the current definition for already started processes. Making big changes at once, very often does not end well. I really encourage you to make all the changes at small iterations. It creates less effort for QA, probability for detecting errors is smaller, also probability of errors after release is also smaller. If you really need to implement versioning, remember it’s not part of the state machine logic, but it is part of the storage logic. You will need to keep a version id, in addition to the already mentioned unique process identifier. Consider this also from the business point of view. Creating a new process version definition will only affect new process instances. This can be especially important if your process spans many weeks or months. This may not be an issue for online shops, but think about medical companies or long term subscriptions. In such cases you may not want your customers to miss the change.
If you will store only the state name, then as long as the next definition contains it, it is not a problem. The state machine will start from that position. After all, possible transitions are tested during execution. Let the state machine figure out, based on its definition, what should be the next state.
If you plan to remove a state or move it far away from its current position, you need to migrate some of the currently stored processes, by simply renaming their stored active state name. In some cases you may need a 2-step release, by introducing a temporary state.
This kind of migration is simple, your processes are always up-to-date with the most recent changes, and you do not have a luggage of legacy definitions which most probably increase the maintenance cost, and maintenance, costs your business time and money.

Summary

State machines can support your business. Its technical implementation is not rocket science. Properly defining the process flows and guarding them from over complicated or simply wrong changes is the hard part. From a technical point of view you can change really a lot at any moment. From a business point of view, making changes while the process is already on the go, may require preparation, analysis of the upcoming modifications, and finally defining the required changes, taking the state machine implementation possibilities into account.