When to Trust Robots with Decisions, and When Not To

Vasant Dhar

May 17, 2016

Smarter and more adaptive machines are rapidly becoming as much a part of our lives as the internet, and more of our decisions are being handed over to intelligent algorithms that learn from ever-increasing volumes and varieties of data.

As these “robots” become a bigger part of our lives, we don’t have any framework for evaluating which decisions we should be comfortable delegating to algorithms and which ones humans should retain. That’s surprising, given the high stakes involved.

I propose a risk-oriented framework for deciding when and how to allocate decision problems between humans and machine-based decision makers. I’ve developed this framework based on the experiences that my collaborators and I have had implementing prediction systems over the last 25 years in domains like finance, healthcare, education, and sports.

The framework differentiates problems along two independent dimensions: predictability and cost per error.

Consider the first of these dimensions: predictability.

Predictability refers to how much better than random we should expect today’s best predictive systems to perform.

This diagram presents examples of a number of problems ordered by their predictability given the current state-of-the-art in machine learning and AI technology. The extreme left shows a coin toss which has “zero signal”—an activity in which prediction won’t be any better than random. The extreme right suggests purely deterministic, mechanical decision problems.

Moving between these from left to right, we begin with the example of long-term investing, where evidence – and economic theory – tell us that humans do poorly, typically no better than random. As the prediction horizon becomes shorter (short-term and then high frequency trading), however, predictability increases, albeit only marginally. Moving to the right, credit card fraud detection and spam filtering have higher levels of predictability, but current-day systems still generate significant numbers of false positives and false negatives. At the extreme right, we place highly structured problems with the most predictability. Driverless cars, for example, operate in domains in which the physics is well understood. While there is some uncertainty associated with actions of other vehicles and the environment, machines can still learn to drive much more safely than humans can, on average.

By ordering tasks along this dimension, it becomes clear where the current automation challenges and opportunities are. However, while it may be tempting to limit the analysis to discussions of predictive power and infer that “high signal problems can be robotized and low signal ones require humans,” this one-dimensional view is incomplete. In order to correctly calculate whether to hand off a decision to a robot, one must consider the consequences of a mistake—a variable that’s at least equal, and perhaps more important, than prediction accuracy.

In this fuller, two-dimensional representation (which I call a DA-MAP), horizontal position indicated predictability, the same as in the earlier graphic. Cost per error, which can be expressed in monetary or other utility units (depending on the problem), is specified along the vertical axis.

Adding this second dimensions provides important new insights.

Consider two of the relatively higher predictability problems mentioned earlier—spam filtering and driverless cars. Spam filtering is a tricky “adversarial” problem where spammers try to fool the filter but the filter is tuned to not block legitimate content, so the cost of false positives should be very low albeit with some spam getting through which is also low. The costs of an error by a driverless car, in contrast, could be very costly. The costs for fighter drone decisions (middle right) are also clearly high (accidentally bombing a hospital instead of a munitions depot, say), but this problem differs from driverless cars in at least two ways: Drones are used in warfare, where there’s more tolerance for errors than on suburban roadways, and using them mitigates the substantial risk to pilots flying over enemy territory.

Prediction errors in healthcare can also have significant costs. For example, failing to predict diabetes when it exists, a false negative, could result in severe outcomes such as the loss of a limb. False positives could result in prescribing medication or conducting tests when none are needed.

Of course, the location of a particular problem on this two-dimensional representation also changes as a function of technological and societal changes. Improvements in predictive capability from more data and better algorithms shift problems to the right. (These shifts are illustrated by orange arrows.) Additional regulatory burdens increase the cost of errors and so move a problem up, while less regulation or reduction in liability would move it down (blue arrows). Changes in societal norms and values – loss of public support for drone warfare, for example – would also result in changes to the map.

The DA-MAP also shows examples of movements for the various problems, along with possible “automation frontiers” between human- and machine-appropriate decision problems.

An automation frontier (represented by the dotted lines) is an upward sloping line that represents the existing boundary between acceptable predictability and error. A higher cost per error requires a higher level of predictability for automation. The convex frontier in the figure represents a more stringent automation barrier than the linear one.

Below the automation frontier, we see several problems, such as high frequency trading and online advertising, which have already been automated to a large degree due to the low cost per error relative to the benefits of reliable and scalable decision making. In contrast, above the frontier, we find that even the best current diabetes prediction systems still generate too many false positives and negatives, each with a cost that is too high to justify purely automated use. This is why doctors are still integrally involved in judging patients’ risk for diabetes. On the other hand, the availability of genomic and other personal data could improve prediction accuracy dramatically (long orange arrow) and create trustworthy robotic healthcare professionals in the future.

Changes in predictability and cost per error can nudge a problem in or out of the robot zone. For example, as driverless cars improve and we become more comfortable with them, the introduction and resolution of laws limiting their liability could facilitate the emergence of insurance markets that should drive down the cost of error.

The Decision Automation Map can be used by managers, investors, regulators, and policy makers to answer questions regarding automated decision making. It can help people prioritize automation initiatives, and it can highlight problems for which the required expertise is learnable by machines from data with minimal preprogramming and for which error costs are low.

Perhaps the biggest challenge for the deployment of data-driven learning machines is the uncertainty associated with how they will deal with “edge cases” that are encountered for the first time, such as the obstacles encountered by Google’s driverless car that caused a minor accident. Humans extend common sense intuitively to bizarre or novel situations, but in these cases there remains significant uncertainty about what the machine has learned and how it will act. In such edge cases, the outcome could be much worse. The larger the uncertainty around these cases, the less we will prefer them with decisions over good old evolution, intuition and common sense.

For society, the most vexing concern is whether automation will render millions of human jobs obsolete. In the early 1960s the Nobel laureate in Economics, Herbert Simon predicted that although many “programmable” decisions in business would be automated in a few decades, worries about the “bogeyman of automation” were misplaced. So far, Simon’s projections have turned out to be prescient on both counts, as automation continues to create new jobs and lifestyle for humans. What remains to be seen, however, is whether the new breed of machines that can see, hear, read, and reason will eliminate more human jobs than they create.

Vasant Dhar is a professor of information systems at New York University’s Stern School of Business and the editor-in-chief of the journal Big Data.

POST