Wednesday, July 12, 2017

When Data Attacks

One way of thinking about the Internet is as a giant matching machine. You have a question, it finds you an answer. You want a flight, it finds you a good deal. You want a date, it can find that for you too. Lots of them in fact.

But is this the whole story? Not exactly.

A fairly simple problem/solution scenario is how things worked in the days of Web 1.0, a pre-data collection web that hadn’t yet developed, let alone mastered, micro-targeting by such attributes as demographics, psychographics, and location. And before you cry “surveillance!” bear in mind that it is the advertising-supported, data slicing and dicing web that brings so much to all of us each day in the form of news, entertainment, and productivity tools. Not to mention that the systems that optimize marketing done online also help filter out that which could be called ‘noise’; i.e. if I don’t have kids I won’t get daycare ads on Facebook, if I don’t have a dog or a cat I won’t get coupons for kibble popping up alongside YouTube videos I watch.

Is this all for the better? As with many things, it depends how you look at it, and it depends who you ask. If you ask mathematician Cathy O’Neil, author of Weapons of Math Destruction, the answer would be no.


At a recent talk held at Microsoft Research O’Neil began by describing what an algorithm is. “It’s something we build to make a prediction about the future…and it assumes that things that happened in the past will happen again in the future.”  O’Neil explained that algorithms use things such as decision trees, which contain if/then and yes/no statements and then use historical information, pattern matching, and machine learning to build models that can make thousands to millions of predictions, and can do so in a fraction of the time of a human being with a calculator and a scratch pad. 

So what’s not to like? The problem is, according to O’Neil, that the agenda of the algorithm is decided by the builder of the algorithm. What goes into the algorithm is, necessarily ‘curated’, and when some variables are selected while others are left out, then a value system is embedded in the algorithm.

These value systems in turn can affect decisions now made by machines that used to be made by humans, such as hiring, credit worthiness, professional evaluation, and insurance eligibility. Researchers, including O'Neil herself, have attempted to find out the rules that inhabit some of these algorithms, using Freedom of Information requests, but according to O'Neil many such requests have not been successful. Furthermore, many of the data-driven systems responsible for making millions of decisions are built on proprietary, or 'black box' software architectures, that are extremely difficult to reverse engineer.

But let's bring things back to how data interfaces with you in your daily life. If you’ve ever wondered, for example, why you often spend a half hour on hold when you call customer support and your friends say they get through right away the explanation may be more than “we’re experiencing larger than normal call volumes.” Maybe they are, but maybe, as O’Neil points out, it's something else. She cites the example of how it is a common practice for customer service lines to pre-determine if you’re a high value customer or a low value customer based on the purchase and credit information cross-referenced with your phone number. And, well, you can figure out who gets put through to a real live human operator and who has to listen to extended musical accompaniments of flutes and vibraphones.


O’Neil calls such processes “systematic filtering”, and is concerned that machine learning, a key component of artificial intelligence -- which is said to be the next revolution in computing -- “automates the status quo” and in turn creates “pernicious feedback loops” that not only trap people in the biases of the past but also magnify those biases as machine learning is itself based on recursive loops and neural networks.

This was not and is not the intention of any such systems of course. The point of deploying data, at scale, is to build models at a speed and complexity that far exceeds the capability of humans. As with any technological innovation, there exist unintended consequences, and the decisions made by data-driven systems are no different.

For an overview of Cathy O'Neil's book "Weapons of Math Destruction" click here.

This post also appears, in a slightly revised version, on the blog of the American Marketing Association, Toronto Chapter.