Download PDF

Business process discovery: new techniques and applications.

Publication date: 2012-10-23

Author:

De Weerdt, Jochen

Abstract:

English SummaryWithin today's organizations, electronic data are being collected and accumulated at a drastic pace. Although a lot of investments have been made in data collection and storage, similar investments in analyzing these data remain behind. Especially with respect to business processes, automation is increasing incessantly which causes a significant growth in the availability of process-related data. Within and across departments, information systems have been put in place in order to coordinate and facilitate the business processes. Since these systems keep track of every-day business transactions, important opportunities arise for analysis of process-related data so as to provide insight into the actual way of working. Moreover, automation often causes an increased complexity with a reduced overview of the overall end-to-end process. Because of the lack of sensible overview and tangible insights, there exist vast opportunities for data analytics, a domain which can be best defined as a broad set of intelligent techniques for gaining insights from data. When the focal point of these intelligent techniques is business processes, the field of research is often termed Process Mining. To be precise, the key objective of Process Mining is the extraction of non-trivial knowledge from event data as recorded by information systems.The Field of Process Mining: At the Intersection of BPM and KDDProcess Mining is a relatively young area of academic research. The main goal of process mining techniques is to employ process-related data in order to extract information and knowledge, for instance by automatically discovering a process model. As such, Process Mining finds itself at the intersection of two larger domains: Business Process Management (BPM) and Knowledge Discovery in Databases (KDD). BPM is the collective term to designate concepts, methods and techniques to support the design, configuration, enactment, and analysis of business processes. The cradle of BPM is process modeling, which refers to the identification and specification of business processes. In general, four important phases can be identified in the life cycle of a business process: design, implementation, execution and diagnosis. Process Mining is an entire part of the diagnosis phase in the BPM life cycle. In contrast to Business Process Intelligence (BPI) and Business Activity Monitoring (BAM), domains that also fit in this phase, Process Mining is a more powerful set of methods to deal with a thorough, bottom-up investigation. Process mining techniques do not assume any of the critical points to be known upfront since this type of analysis will often start with the discovery of the actual business processes from the recorded data. The exploratory nature of Process Mining allows a relatively unbiased examination of the business process at hand. In this way, Process Mining proves to be an ideal means for guiding process improvement and redesign approaches. Secondly, Process Mining is strongly related to the field of Knowledge Discovery in Databases (KDD). KDD is often referred to as the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data. Data mining techniques play an important role in this process since they actually extract patterns from data. In the last decade, the importance of KDD and data mining has grown significantly since organizations that excel with respect to their analytical capabilities create a significant competitive advantage. Metrics for Quantifying the Quality of Discovered Process ModelsThe first part of this thesis covers the problem of evaluating the quality of discovered process models. The quality of a discovered process model can be judged along a multitude of different perspectives. These perspectives can be categorized into two high-level dimensions: accuracy and comprehensibility. Our contribution is twofold. First of all, an evaluation study is performed on existing evaluation metrics. With this study, the advantages and drawbacks of currently available metrics are mapped. Secondly, based on the idea of inducing artificial negative events into an event log, a novel precision metric is proposed. This precision metric serves as the input for the application of the well-known F-score for process discovery evaluation. This novel evaluation methodology is considered an important step towards the definition of a more comprehensive evaluation framework for discovered process models.A Multi-Dimensional Quality Assessment of Process Discovery TechniquesWithin the field of Process Mining, there is a traditional reliance on artificial data in order to develop learning algorithms. This focus on artificially generated data is logical because of two reasons. Firstly, it allows for researchers to develop algorithms that are able to mine special process constructs. Secondly, relying on artificial data allows for a straightforward analysis of the correctness of a process discovery technique since the behavior that is part of the artificially generated event log is known upfront. Accordingly, the evaluation of the quality of process discovery techniques in practice has received only modest attention. In this respect, this work contributes to the literature with the first multi-dimensional quality assessment of state-of-the-art process discovery techniques using eight real-life event logs. Furthermore, the discovery techniques gauged along three key quality criteria, namely accuracy, comprehensibility and scalability. Improving Process Discovery with Active Trace ClusteringBy far the most arduous challenge for process discovery algorithms consists of tackling the problem of accurate and comprehensible knowledge discovery within highly flexible business process environments. Event logs from such flexible systems often contain a large variety of process executions which makes the application of Process Mining most interesting. However, simply applying existing process discovery techniques will often yield highly incomprehensible process models because of their inaccuracy and complexity. With respect to resolving this problem, trace clustering is one very interesting approach since it allows splitting up an existing event log so as to facilitate the knowledge discovery process. In this chapter, a novel trace clustering technique is described which significantly differs from previous approaches. Above all, it starts from the observation that currently available techniques suffer from a large divergence between the clustering bias and the evaluation bias. By employing an active learning inspired approach, this bias divergence is solved. In an assessment using both a controlled environment as well as four real-life event logs, it is shown that our technique significantly outperforms currently available trace clustering techniques from a process discovery evaluation perspective. Real-Life Case StudiesFinally, this thesis contains a description of four real-life case studies in which the application of process mining techniques is demonstrated in areas such as financial services, telco, healthcare and public services.