Applied Reinforcement Learning in Process Mining with SberPM library

5 min readJul 6, 2021

In the previous story we have introduced a basic set of SberPM features. This article will further explain the library’s functionality with focus on the new process and customer path optimization methods that use reinforcement learning for optimal path reconstruction.

Business processes optimization is the key component for operational performance excellence.

Nowadays companies make decisions based on experts’ guesses, but why not use modern Machine Learning algorithms? SberPM applies reinforcement learning to ‘guess’ which processes path will be the optimal one according to one of the specified criteria:

No cyclic processes
Minimum stage execution time
Shortest path for stage marked as ‘success’
Or simply, just a successful process completion

What are good cases to consider reinforcement learning in process mining?

Surely, when a process is small and obvious there is no need in machine learning, an expert or an analyst will handle this case well. But real-world processes had become more complex and tangled. So, if your process:

…has unobservably numerous amount of stages. Processes with hundreds or thousands of stages are hard to analyze and are next to impossible to visualize.
…contains lots of unique paths. Process paths are not equally frequent. RL allows to identify and filter out non-optimal ones.
…is in a redesign phase. RL will generate hints and optimal ‘to-be’ state of a process.
… is a customer path. Finding out the fast and easy way of delivering the customer to the point of successfully transaction.

Each option above deals with a vast amount of non-observable data. RL allows you to both understand the customer’s needs, as well as optimize their path by increasing speed and conversion.

Reinforcement Learning in 1 minute

In Reinforcement Learning a machine learning model trains to solve problem by trial and error, which is very close to the way we learn new skills in real life.

Typically, reinforcement learning singles out two abstract objects — the environment and the agent. The agent interacts with the environment and receives feedback — also known as rewards. In the learning process the agent strives to perform actions that maximize its’ rewards.

So, in process mining the environment is set by an event-log and losses and rewards are defined by user.

Using SberPM to identify the optimal process path

Let’s get practial and dive into whole thing with SberPM.

Step 1: create environment and select a method

data_holder contains event-log file (see examples in part 1)
RLOptimizer creates the process environment, where are statesare the stages of a process, and actions are transitions between the stages. Each transition is rewarded according to user config (i.e. related to business aims). Reward design has crucial role here.

Small example, let’s assume we have a process and rewards are already designed:

The agent is placed in the begin state and all possible transitions in the process are already set, so the agent “chooses” its path to the end state by maximizing the total reward (green and red numbers on the picture above) for completing the process path

Note: you should select a learning method when environment is created:
- Exploitation: while training, the agent follows the chains from the log file only.
- Exploration: the agent is free to choose the actions on its own from the very beginning. In this way chains that were not part of the data might appear during the learning process.
For example, with the Exploitation method it would be more likely to identify the optimal chain begin->A->C->success->end. With the Exploration method you might end up not finding this path if the data only contained such paths as begin->A->B->…

Step 2: setting the rewards

Let’s set the rewards:

Rewards design is the key to a successful process optimization. We need to evaluate which metric is the most important for the process, then set the successful outcomes and define the cycles fee. The fee can be reduced if certain cycles are normal for a process.

Step 3. Learning and reconstruction

Let’s start training!

The result of fit method is a pandas.DataFrame with the activities chains generated by the agent and the rewards for them. The reconstructed process is the multitude of the best chains.

And yes, it’s that simple! So, time to test the approach on some real data. We took a BPI Challenge 2020 dataset and executed the pipeline.

BPI 2020 results

This is what the initial process looks like (from raw event data):

The graph is visually structured and you can see various deviations from the main process flow.

2.36% — average cycled processes percentage;
8 stages — average path length;
74% — probability of successful completion.

Metrics are quite good for this type of process. Let’s train an algorithm and optimize the process with SberPM RL module and compare the results.

After executing pipeline from above, the reconstructed (optimized) process is:

The dream process has a distinct structure that coincides with original’s business point and you can see the main path and its deviations. Let’s calculate the metrics in the reconstructed process:

0% — average cycled processes percentage;
4.7 stages — average path length (40% shorter);
100% — probability of successful completion.

Conclusion

The RL module has already been successfully used in business processes and client paths analysis, and now it is available in SberPM library repository at GitHub.

We appreciate your feedback on our projects, and your comment and reviews will help us create an easy-to-use and functional product.

Feel free to reach us via mail: Aleksandr Korekov av.korekov@gmail.com and Danil Smetanev: smetanev.danil@gmail.com or directly at GitHub.