Managing Bias and Risk at Every Step of the AI-Building Process
30 October, 2019 / ArticlesHere’s a scenario common to organizations applying machine learning in their business processes. The business and technical teams have aligned on a general problem statement for a machine learning task. Everyone is excited, and the technical team goes off for a few months and experiments with different algorithms on available data, eventually converging on an algorithm they believe achieves the highest performance on the agreed-upon metrics. Proud of their work, they bring results back to the business to integrate into a business process or implement as a feature in a software product.
Before deployment, however, their algorithmic model must be reviewed by a governance team to ensure it satisfies risk-management requirements. The governance team seeks rigorous documentation concerning requirements that the technical team never considered: can we explain why the algorithm derives its outputs from its inputs? What controls does the system use to protect client privacy? How stable are the input data over long periods of time, especially if we only plan to retrain the model once a month, or even once a year? Can we ensure the algorithm produces fair results across the population of affected clients?
The technical team retorts that no one told them about these requirements, so they didn’t consider them during development. Frustrated, they start again, this time constraining choices about possible algorithms and input data to ensure that the newly articulated risk management requirements are satisfied. Time and effort are wasted. Timelines stretch. Executives wonder why things are taking so long and become anxious they will lag behind competitors.
This lack of communication and coordination results from gaps in the knowledge that business and technical stakeholders have about what it takes to make machine learning work in real-world applications, as well as residual waterfall approaches to technical project management. As the field is still young, many machine-learning developers lack experience in building enterprise applications, and many business stakeholders have insufficient knowledge of machine learning to know what questions to ask as they scope and manage projects. To innovate effectively, project owners need to know what trade-offs and decisions they’ll face while building a machine learning system, and when they should assess these trade-offs to minimize frustration and wasted effort.
Let’s start with the what. Making machine learning work in a business context often requires a series of decisions and trade-offs that can impact model performance. At the heart of the matter lies the structure of machine-learning algorithms, which use data to learn approximate mappings between inputs and outputs that are useful for a business. With standard software, programmers write specific instructions that execute the same operations every time; the trade-off is that these instructions are limited to what can be explicitly articulated in code. With machine learning, by contrast, programmers specify the goal of the program and write an algorithm that helps the system efficiently learn the best input-output mapping from available data to achieve this goal, rather than selecting a particular mapping from the get-go. This approach enables us to tackle scenarios where it’s much harder to write the precise rules (e.g., image recognition, textual analysis, generating video sequences), but the trade-off is that the system’s output can be unpredictable or unintuitive. From this inductive foundation arise many of the nuances of applying machine learning systems, especially for business and technology teams who are used to rules-based computer systems.
Often these nuances come to life as trade-offs business teams need to make during system development. For instance, teams may find that the objective the business seeks to predict or optimize cannot be easily measured with available data, so they resort to a measurable proxy. The concern here is that the system will learn to predict this and only this proxy, not the true objective; the further the proxy is from the objective, the less useful the system’s output will be for the business.
Other trade-offs weigh model accuracy against concerns like explainability, fairness, and privacy. If the business is hesitant to adopt a model without an explanation of how and why it maps inputs to outputs, the technical team could constrain the set of potential solutions to algorithms that afford better explainability, but this may come at the cost of reducing model performance. Similarly, if the business is concerned that the system could propagate unfair outcomes to certain client segments, or that the system could expose sensitive user data, the technical team could restrict their attention to algorithms that ensure fairness and privacy, which could also impact performance.
Sometimes inherent error rates make it such that it’s best to cut losses early. This occurs particularly in contexts where the cost of making a mistake are high (e.g., user trust could be denigrated, or users expect certainty). The technical team could invest heavily in system design to improve accuracy or implement human-in-the-loop decision-making, but if the costs of these investments exceed potential benefits, cutting losses early may be the best solution.
These considerations are too nuanced to be managed in a single working session. Instead, project owners should engage business, end-user, technical, and governance teams in iterative dialogue throughout the system development process.
At Borealis AI, they break down this process into the following lifecycle:
- Design: Define the problem and articulate the business case. Determine the business’ tolerance for error and ascertain which regulations, if any, could impact the solution.
- Exploration: Conduct a feasibility study on the available data. Determine whether the data are biased or imbalanced, and discuss the business’ need for explainability. May require re-iterating from design phase depending on the answers to these questions.
- Refinement: Train and test the model (or several potential model variants). Gauge the impact of fairness and privacy enhancements on accuracy.
- Build and ship: Implement a production-grade version of the model. Determine how frequently the model must be retrained and whether its output must be stored, and how these requirements affect infrastructure needs.
- Measure: Document and learn from the model’s ongoing performance. Scale it to new contexts and incorporate new features. Discuss how to manage model errors and unexpected outcomes. May require re-iterating from build-and-ship phase depending on the answers to these questions.
The science man and innovator, Fernando Fischmann, founder of Crystal Lagoons, recommends this article.