1.1 ProcessA process is a program that is running on your computer. This can be anythingfrom a small background task, such as a spell-checker or system eventshandler to a full-blown application like Internet Explorer or Microsoft Word.All processes are composed of one or more threads. Since most operatingsystems have many background tasks running, your computer is likely to havemany more processes running than actual programs. For example, you mayonly have three programs running, but there may be twenty active processes.11.2 MiningMining refers to collecting data from several perspectives and summarizing itinto useful information. With the support from this information we can makeuse of it to achieve efficient process mining.1.3 Event LogsIt is collection of cases, where each element refers to a case, an activity and apoint in time (timestamps). You can find sources of event data everywhere.For instance, in database system, transaction log (e.g. a trading system),business suite ERP system (SAP, Oracle…), message log (e.g. from IBMmiddleware), open API providing data from websites or social media, CSV(comma-separated values) or spreadsheet etc. When extracting event log, youcan face the next challenges:1) Correlation. Events in an event log are grouped per case. This simplerequirement can be quite challenging as it requires event correlation, i.e.,events need to be related to each other.2) Timestamps. Events need to be ordered per case. Typical problems: onlydates, different clocks, delayed logging.3) Snapshots. Cases may have a lifetime extending beyond the recordedperiod, e.g. a case was started before the beginning of the event log.4) Scoping. How to decide which tables to incorporate?7 | Page5) Granularity. The events in the event log are at a different level ofgranularity than the activities relevant for end users. Additionally, event logswithout preprocessing have so called noise and incompleteness. The first onemeans the event log contains rare and infrequent behavior not representativefor the typical behavior of the process. And incompleteness – the event logcontains too few events to be able to discover some of the underlying control-flow structures. There are many methods to “clean” data and use only usefuldata as filtering and data mining techniques. Every event log must have somecertain fields, without that PM will be impossible.The Figure 1. shows basic attributes of the events in the logs:Case ID – instances (objects), which are arranged sequence of events log.Activity name – actions performed within the event log.Timestamp – date and time of recording log events.Resource – holds the key actors log events (those who perform actions in theevent log).”Data is new oil” to emphasize, data plays an important role nowadays. It is notsufficient to focus on data storage and data analysis. The data scientist also needs to relatedata to process analysis. Process mining bridges the gap between traditional model-basedprocess analysis (e.g., simulation and other business process management techniques)and data-centric analysis techniques such as machine learning and data mining. 2Data mining, also called knowledge discovery in databases, in computer science,the process of discovering interesting and useful patterns and relationships in largevolumes of data. The field combines tools from statistics and artificial intelligence (suchas neural networks and machine learning) with database management to analyze largedigital collections, known as data sets. Data mining is widely used in business (insurance,banking, retail), science research (astronomy, medicine), and government security(detection of criminals and terrorists). 18 | PageSimilarities between Data and Process mining: Both techniques are used toanalyze large amounts of data that it would be impossible to analyze manually, bothtechniques produce information that can be used for making business decisions, Bothtechniques use the "mining" techniques where algorithms traverse through large volumesof data, looking for patterns and relationships. Of course, there are some similarities, asboth techniques can be categorized as Business Intelligence. But, as mentioned before,the two techniques have different perspectives and goals. Differences between Datamining and process mining Data mining techniques are using multi-dimensional views(cubes) on data which can be drilled up and down (in different aggregated levels). 42. Process Mining2.1 OverviewThe term "process" can also be used as a verb, which means to perform a series ofoperations on a set of data collected through mining. This whole process ofcollecting, analyzing big data mined in the mining phase is called Process Mining.Process mining (PM) techniques are able to extract knowledge from event logscommonly available in today's information systems. These techniques providenew means to discover, monitor, and improve processes in a variety of applicationdomains. There are two main drivers for the growing interest in process mining:1) more and more events are being recorded, thus, providing detailedinformation about the history of processes;2) there is a need to improve and support business processes incompetitiveand rapidly changing environments.2.2 Types of Process MiningThere are three main types of process mining.1. The first type of process mining is discovery. A discovery technique takes anevent log and produces a process model without using any a-prioriinformation. An example is the Alpha-algorithm that takes an event log andproduces a process model (a Petri net) explaining the behavior recorded in the9 | Pagelog. This type of PM helps the user to compile a process model from scratchtaking input the tasks logged in the event log which can be all the processesthat run from the login until the user logs out of the system . 62. The second type of process mining is conformance. Here, an existing processmodel is compared with an event log of the same process. Conformancechecking can be used to check if reality, as recorded in the log, conforms tothe model and vice versa. It gives a diagnostic report after comparing theprocess model that was created in discovery with the event logs through whichthe model was created to check for anomalies and give a system alert ifprogrammed to make known of the anomaly. 63. The third type of process mining is enhancement. The main idea is to extendorimprove an existing process model using information about the actual processrecorded in some event log. Whereas conformance checking measures thealignment between model and reality, this third type of process mining aims atchanging or extending the a-priori model. An example is the extension of aprocess model with performance information, e.g., showing bottlenecks. Thismight be done on a current working system where the model and event logsdo conform and the system runs perfectly but might not be very efficient ormay have window for anomalies to occur leading to intrusion in the system orsystem breakdown so to prevent this the entire model if run throughenhancement to generate a new model that is efficient and full proof. 610 | PageFigure 2. The three basic types of process mining explained in terms of what does it require andthe output that is gives for the user.3. Approach to PM3.1 Real-time Process MiningThe term ‘real time’ is used subjectively of systems which appear toprocess information ‘fast’. Formally, real-time systems ‘must react withinprecise time constraints to events in the environment’. The key ispredictability and results guaranteed in a specified time, rather than speed.This means identifying process change as soon as possible, but withconfidence that change was an anomaly and not just a simple userinteraction mistake. We consider two main constraints: accuracy and time.The mining algorithm should check that the model is ‘close’ to the ‘true’model, we expect accuracy to increase with the amount of data, but this willalso increase mining time. So, these two constraints act in tension. desire tominimize mining time, but characteristics of the ground truth distribution willdetermine the minimum data needed for confidence in mining accuracy. Thislower bound on data ensures we use the correct baseline, against which tomeasure change. Although an upper bound can be set on the mining time,this will be constrained by the overhead of the algorithm or can be the timetaken to process each Run-through in the event log, and by the desiredaccuracy. There are other issues which we do not consider, such as from thetype or magnitude of change, predicting the time to detect it; orenvironmental issues which may affect the real-time behavior of the system.43.2 Determining the Amount of Data Needed for MiningOne way to determine the amount of data needed is to consider the structuresin a process (highlighted in Fig.1), and the probability of an algorithm discoveringthese structures. In 14 we discuss this approach and apply it to the Alphaalgorithm 11, which uses heuristics about the relations seen between pairs of tasks11 | Pagein the log, to construct a Petri net. To compare this non-probabilistic model againstthe ground truth distribution, we convert the net to a PDFA by labelling itsreachability graph (RG) with maximum likelihood probabilities obtained from themining log. This allows us to satisfy the accuracy constraint. We do not address thetime constraint, since Alpha has low complexity and although the time to generatethe RG is exponential in the number of states, we use only simple acyclic models.Business process models are in general relatively simple, but further work is neededto validate the efficiency of our approach. 43.3 Methods to Detect Process ChangeWe mine repeatedly from sub-logs, using a ‘sliding window’, and comparethem to the live current user trace, the distribution generated by the minedmodel with the ground truth distribution. There are many measures but it isnot clear what distance is statistically significant. Instead, we use statisticaltests for detecting that the mined distribution, or its PDFA representation, haschanged significantly from the ground truth or no. 44. PM Model Generation4.1 PreprocessingAs stated above, the starting point for process mining is event log.Throughout this paper, the term trace is used to refer to a process instance of aprocess model in the log and it represents the order in which activities areexecuted with all information recorded for every event. The goal of PAHIDmodel is to discover attacked traces in the log. The first step of the proposedPAHID model is related to keeping (or removing) some activities or tracesfrom log that are (not) appropriate and important for analysis 1 . Hence, threetypes of tasks can be executed in this step, based on the decision of thedomain analyst:? Removing incomplete traces in the event log.? Completing incomplete traces by rerunning the whole process again orfilling in from the previous traces.? Removing irrelevant tasks for the traces to make an efficient event log.12 | Page4.2 Anomaly detectionIt constructs a reference model that represents normal behavior ofthe information system and users. Then, this reference model is used to analyze thecurrent activities of the system looking for any deviation from it. For constructingthis reference model, an extensive training set (log) of normal behavior of thesystem and users is needed. Nevertheless, based on the unpredictable behavior ofthe users, this normal log cannot consider all of the possibilities or it is so complex.Moreover, in the flexible application domains, such a normal log is not knownbefore execution of the information system. Therefore, in the proposed model, it isassumed that a normal log/model is not available. The reference model isconstructed during anomaly detection phase using current event log of theinformation system This reference model is constructed during anomaly detectionphase using current reference model is called the appropriate model. To performanomaly detection phase more quickly and easily, domain analyst can remove someevents from log traces. These events are not related to control-flow perspectiveanalysis. For example, tasks that will be analyzed in the misuse detection phase canbe removed in this step. Simple log filtering tools of ProM can be used to removeirrelevant events from traces in the log. The input is the preprocessed log, LP,gained from the previous step and the output is the control-flow preprocessed log,LCP, in MXML format. 14.3 Misuse DetectionChecking Attack rule to increase the accuracy of detection, and to detect moreattacks, misuse detection technique is used to detect misused traces in theorganizational perspective. Misuse detectors monitor the system activities tofind predefined events or sets of events. These events or sets of eventsrepresent the behavior pattern of a known attack. These patterns can bedefined in the form of rules as in this work. The rules are checked over all ofthe traces in the log. Misused traces are those in the log that fit any attack rule.13 | PageFor implementing this step, four types of attacks related to organizationalperspective are considered. These attacks can gain control of the informationsystem by exploiting a variety of system flaws:• User to Root (U2R): An authorized (legitimate) user gain unauthorizedaccess to the information system.• Remote to Root (R2R): An unauthorized remote user gain access to theinformation system from the Internet.• Password Guessing: The intruder tries to guess the password of a user byentering incorrectly the username and password more than three times.• Admin High Privilege Misuse: Administrator of the information systemmisuses his/her high privileges and performs activities of other usersmaliciously. These are the Four main types of attacks many more can beconsidered. 14.4 ResultAnomalies of control-flow perspective and misuses of organizationalperspective are detected. Merging phase gets the result of the previous phase(anomalous traces and misused traces) and merges them as attacked traceswith their attack types. Every attack can be an anomaly in activities order orany four types of attacks considered in misuse detection phase. 1 LinearTemporal Language This phase is implemented through Merging programdeveloped with Java programming language. The inputs are anomalous andmisused traces and the outputs are attacked traces with their type of attacks.Using the design descriptions stated above, Figure 3 illustrates the PAHIDmodel with more details. 15. Operational working5.1 DetectFigure below illustrates type of operational support. Users are interacting withsome enterprise information system. Based on their actions, events arerecorded. The partial trace of each case is continuously checked by the14 | Pageoperational support system, which immediately generates an alert if adeviation is detected. 45.2 PredictWe again consider the setting in which users are interacting with someenterprise information system. The events recorded for cases can be sent tothe operational support system in the form of partial traces. Based on such apartial trace and some predictive model, a prediction is generated. 45.3 RecommendThe setting is similar to prediction. However, the response is not a predictionbut a recommendation about what do next. To provide such arecommendation, a model is learned from “post mortem” data. Arecommendation is always given with respect to a specific goal. For example,to minimize the remaining flow time or to decrease the total cost andmaximize the number of cases handled within that time period. 46. PM Process Types6.1 Lasagna processA process is a Lasagna process if with limited efforts it is possible to create anagreed upon process model that has a fitness of at least 80%. 4 The maincharacteristics of Lasagna processes are:? Easy to discover, but it is less interesting to show the "real" process.? Whole process mining toolbox can be applied.? Added value is predominantly in more advanced forms of process miningbased on aligning log and model.6.2 Spaghetti processesThese are less structured than Lasagna processes, only some of processmining techniques can be applied. There are different approaches to getvaluable analyze from such kind of processes. For example, method Divideand Conquer (by clustering of cases) or showing only the most frequent pathsand activities. 415 | Page6.3 ApplicationsLasagna processes are typically encountered in production,finance/accounting, procurement, logistics, resource management, andsales/CRM. Spaghetti processes are typically encountered in productdevelopment, service, resource management, and sales/CRM. 4Figure 3. Applications of Spaghetti (violet cells), Lasagna (blue cells)processes and both (pink cells). Nevertheless, Spaghetti processes are veryinteresting from the viewpoint of PM as they often allow for variousimprovements. A highly-structured well-organized process is often lessinteresting in this respect; it’s easy to apply PM techniques but there is alsolittle improvement potential.7. Tools for PMAll techniques described above were realized in such software as PROM. ProM is anextensible framework that supports a wide variety of process mining techniques in theform of plug-ins. The main characteristics of PROM:? Aims to cover every aspect process mining in single framework.? Notations supported: Petri nets (many types), BPMN, C-nets, fuzzy models, transitionsystems, Declare, etc.? Also, supports conformance checking and operational support.16 | Page? Many plug-ins are experimental prototypes and not user friendly. This is extremelypowerful instrument, but confusing for someone. Nowadays already there exist 600 plug-ins and this amount grows up.There is also commercial software Disco that has following characteristics:? Focus on discovery and performance analysis (including animation).? Powerful filtering capabilities for comparative process mining and ad-hoc checking ofpatterns. ? Uses a variant of fuzzy models, etc.? Does not support conformance checking and operational support.? Easy to use and excellent performance. Disco can be used by unexperienced people, hasintuitive user-friendly interface. 48. Future DevelopmentsRefining Process Mining Framework: Today many data are updated in real-time andsufficient computing power is available to analysis events when they occur. Therefore,PM should not be restricted to off-line analysis and can also be used for onlineoperational support. Provenance refers to the data that is needed to be able to reproducean experiment. Data in event logs are portioned into “Pre mortem” and “post mortem”.”Post mortem” – information about cases that have completed and can be used forprocess improvement and auditing, but not for influencing the cases. “Pre mortem” –cases that have not yet completed and can be exploited to ensure the correct or efficienthandling the cases. 517 | Page9. Unresolved Challenges of PMNevertheless, as enough new approach PM has a lot of unsolved challenges:- There are no negative examples (i.e., a log shows what has happened but does not showwhat could).- Preprocessing of Event log (problems with Noise and Incompleteness)- There is no clear how to correct recognize attributes of event log- Due to concurrency, loops, and choices the search space has a complex structure and thelog typically contains only a fraction of all possible behaviors.- There is no clear relation between the size of a model and its behavior- Improving the Representational Bias Used for Process Discovery- Balancing Between Quality Criteria such as Fitness, Simplicity, Precision, andGeneralization- Improving Usability and Understandability for Non-Experts 510. ConclusionPM is important tool for modern organizations that need to manage nontrivial operationalprocesses. Data mining techniques aim to describe and understand reality based onhistoric data, but it’s low level of analyze, because these techniques are nor process-centric. Unlike most BPM approaches, PM is driven by factual event data rather thanhand-made models. That’s why PM is called a bridge between BPM and Data Mining.PM is not limited to process discover. By connecting event log and process model, newways for analyzing are opened. Discovered process model can be also extended byinformation from various perspectives. The torrents of event data available in mostorganizations enable evidence based Business Process Management (ebBPM). Wepredict that there will be a remarkable shift from pure model-driven or questionnaire-driven approaches to data-driven process analysis as we are able to monitor andreconstruct the real business processes using event data. At the same time, we expect thatmachine learning and data mining approaches will become more process-centric. Thusfar, the machine learning and data mining communities have not been focusing on end-to-end processes that also exhibit concurrency. Hence, it is time to move beyond decisiontrees, clustering, and (association) rules. Process mining can be used to diagnose the18 | Pageactual processes. This is valuable because in many organizations most stakeholders lack acorrect, objective, and accurate view on important operational processes. Process miningcan subsequently be used to improve such processes. Conformance checking can be usedfor auditing and compliance. By replaying the event log on a process model, it is possibleto quantify and visualize deviations. Similar techniques can be used to detect bottlenecksand build predictive models. 53


I'm Erica!

Would you like to get a custom essay? How about receiving a customized one?

Check it out