Controlled automated discovery of collections of business process models
García-Bañuelos, Luciano × Dumas, Marlon La Rosa, Marcello # De Weerdt, Jochen Ekanayake, Chathura #
Information Systems vol:46 pages:85-101
Automated process discovery techniques aim at extracting process models from information system logs. Existing techniques in this space are effective when applied to relatively small or regular logs, but generate spaghetti-like and sometimes inaccurate models when confronted to logs with high variability. In previous work, trace clustering has been applied in an attempt to reduce the size and complexity of automatically discovered process models. The idea is to split the log into clusters and to discover one model per cluster. This leads to a collection of process models – each one representing a variant of the business process – as opposed to an all-encompassing model. Still, models produced in this way may exhibit unacceptably high complexity and low fitness. In this setting, this paper presents a two-way divide-and-conquer process discovery technique, wherein the discovered process models are split on the one hand by variants and on the other hand hierarchically using subprocess extraction. Splitting is performed in a controlled manner in order to achieve user-defined complexity or fitness thresholds. Experiments on real-life logs show that the technique produces collections of models substantially smaller than those extracted by applying existing trace clustering techniques, while allowing the user to control the fitness of the resulting models.