Applied Machine Learning: A Practical Approach
Question 1: What does “Machine Learning” mean to you?
Machine Learning (ML), Artificial Intelligence (AI), Data Analytics, etc., have become ubiquitous terms. Encyclopedic definition of Machine Learning is as follows; “ML is a study of computer algorithms that improve automatically through experience. It is seen as part of AI.” Due to the ambiguity of this definition, it often does not accurately support business applications. By contrast, business practitioners provide more pragmatic descriptions of ML, as well as useful guidelines for practical applications of ML by utilizing existing and developing new use case scenarios and analyzing proven outcomes.
My view of the methodologies and tools related to ML and AI, which also informs the way I promote their application at Dynamic Risk, is quite practical. Namely: the ML- and AI-related tools implemented at Dynamic Risk should enhance our understanding of and reasoning within, knowledge domains in real-life situations encountered by pipeline operators as part of pipeline integrity management. A primary focus of applying ML-related tools should be on the underlying causal mechanisms that generate the data, rather than just on finding correlations within the data. “Calculators for Reasoning” is my brief shorthand for what real-life AI- and ML-related tools should be.
More specifically, real-life ML-related methodologies and tools applied in the pipeline industry should be:
- Probabilistic: incorporating probabilistic, not just deterministic, quantitative underpinning;
- Flexible: to account for uncertainties inherent in complex problems;
- Updateable: with new information and data, and supporting search for new relevant data;
- Causal: supporting causal, not merely correlational, analysis and reasoning;
- Actionable: providing actionable results for diagnoses, predictions, optimal mitigations, and strategic planning in pipeline integrity management;
- Transparent: in their development and application (avoiding “black box” tools).
Some of the ML- and AI-related tools applied at Dynamic Risk include Bayesian Networks and other Causal Graphical Models, modern statistical analysis methods such as nonlinear optimization, multivariate linear and logistic regression and factor analysis, as well as Neural Networks.
Question 2: What are some challenges with implementing Machine Learning?
Two specific challenges are worth mentioning here, along with proposed ways of overcoming them.
- There is a chance of misinterpreting the terms “Intelligence” and “Learning” in applying AI- and ML-related methodologies and tools in a business environment. Such misinterpretation may lead to setting unrealistic business goals, and hence, the inability to generate value from AI- and ML- methodologies and tools.
- Practical AI and ML-related tools should be able to deal with actual available data, provide actionable results, and be responsive to the expectations of industry regulators and the public.
- In developing and deploying ML-related applications, it is often a challenge for a business to strike a balance between the innovative potential that ML-applications promise (with the unavoidable risk of implementing any novel approach) and the inertia of legacy, tried and true, practices and analytic tools (which may become obsolete or inadequate).
- Some acceptance of business risk is to be expected as part of implementing AI- and ML-related tools in historically successful pipeline businesses. A successful implementation of AI- and ML-related tools is likely to require cross-functional, business-level decision criteria.
Question 3: How do you use Machine Learning tools in your role at Dynamic Risk?
I develop Bayesian Network (BN) models and apply them to problems within a broad rubric of pipeline integrity challenges that we encounter at Dynamic Risk. A BN model is a graphical probabilistic model in which the variables that characterize the knowledge domain are represented as nodes and the dependencies among the variables are represented as arrows linking the nodes.
The variables included in realistic datasets dealt with in most practical applications, specifically the pipeline industry, are stochastic in nature, i.e., they are characterized by probability distributions over their possible values. In other words, the values and interdependencies among the variables are subject to uncertainty. BN models are used to analyze practically any “What If” scenario within the knowledge domain. Such scenarios are expressed as evidence about one, or arbitrarily many, variables in a BN model, thus making their states certain. The BN model then calculates the impact of such evidence on the probability distributions of the remaining variables. A BN analysis consists of repeatedly updating the state of uncertainty over variables in the analyzed knowledge domain by accurately incorporating specific information that becomes available (either from observations or as part of hypothetical scenarios of interest). Through the BN updating process, the uncertainty about the knowledge domain decreases as the information about it increases, thus making it easier to reason and make decisions to achieve specific goals.
Compared with other Machine Learning methodologies, two unique features of BNs are especially important in practical applications:
- BN models can be based on a wide variety of data types: from being directly Machine-Learned from “hard data” (such as large datasets) to using sparse data (that is repurposed from external analyses or reports) to even incorporating so-called “soft data” (consisting of expert estimates, judgements, and opinions). Practically any new relevant data can be incorporated into an existing BN model, thus keeping it up-to-date and consistent with changes of the knowledge domain.
- BN models are capable of mathematically encoding causal assumptions about a knowledge domain. Therefore, BN models provide not only information about correlations among variables, but also quantified recommendations for changing the knowledge domain. This can include optimally prioritized Preventive and Mitigating Measures, as well as explanations of causal mechanisms within the knowledge domain (such as finding the main driving cause of effects observed in the field, reflected in data, or considered in hypothetical scenarios).
Due to their flexible and efficient utilization of data, and their ability to serve as a “calculator for causal reasoning” within complex knowledge domains, BN models have proven to be of high utility in pipeline integrity applications.
Uncertainty Analysis of Data
One useful application of BN models is to serve as highly efficient tools for Uncertainty Analyses of data. The video file embedded below provides a short demonstration of an Uncertainty Analysis BN model developed for one of our clients. This model is based on a Fault Tree, one that is well-known within the pipeline integrity field, for determining the Impact Frequency on a pipeline due to an excavation by a third party, utilizing both industry-wide and client-specific failure probability data. As showcased in the following video demonstration, Uncertainty Analyses can be efficiently conducted for many “What If” scenarios, all completed within the same BN model. This provides the analyst or decision-maker with information not only about point estimates of the Impact Frequency (or of any other variable included in the model), but also about “Worse Case” outcomes (thus stress-testing the risk model), while also quantifying the degree of uncertainty associated with each “What If” scenario.
In conclusion, we have explored what ML is including real-life ML-related methodologies and the tools applied in the pipeline industry, specific challenges with the implementation of ML, and finally, how we are currently leveraging ML tools at Dynamic Risk. For additional information or to connect directly with one of our experts to learn more, please reach out to your Dynamic Risk Representative or contact us via email at email@example.com
Video Demonstration: Uncertainty Analysis for Third Party Excavation Impact Frequency
About the Author:
Sergiy Kondratyuk, Senior Data Scientist, Risk – Technology Enablement
Sergiy Kondratyuk is a Senior Data Scientist with Dynamic Risk. His educational background is in Mathematical and Theoretical Physics. After receiving his PhD in 2000 from the University of Groningen in The Netherlands, Sergiy completed academic research in the areas of Quantum Field Theory and Nuclear Particle Physics. He has held research positions at Canada’s Particle Accelerator Centre (TRIUMF) in Vancouver, at the University of South Carolina, and at the University of Manitoba. Sergiy has co-authored and published over 30 academic papers in peer-reviewed journals.
The skillset that Sergiy has brought into Dynamic Risk is based on his extensive experience with probabilistic and causal analysis methodologies and models that originated in Artificial Intelligence and have been successfully applied in the areas of Machine Learning, data analysis, quantitative support for predictive and diagnostic reasoning, optimized planning, and decision making under uncertainty. His current work involves development and application of methodologies and tools for evaluating quality of data and data-related processes used by a large transmission pipeline company.