We’re already seeing alarm over bizarre YouTube channels that attempt to monetize children’s TV brands by scraping the video content off legitimate channels and adding their own advertising and keywords. Many of these channels are shaped by paperclip-maximizer advertising AIs that are simply trying to maximize their search ranking on YouTube. Add neural network driven tools for inserting Character A into Video B to click-maximizing bots and things are going to get very weird (and nasty). And they’re only going to get weirder when these tools are deployed for political gain.
We tend to evaluate the inputs from our eyes and ears much less critically than what random strangers on the internet tell us—and we’re already too vulnerable to fake news as it is. Soon they’ll come for us, armed with believable video evidence. The smart money says that by 2027 you won’t be able to believe anything you see in video unless there are cryptographic signatures on it, linking it back to the device that shot the raw feed—and you know how good most people are at using encryption? The dumb money is on total chaos.
wo summers ago, Courtenay Cotton led a workshop on machine learning that I attended with a New York–based group called the Women and Surveillance Initiative. It was a welcome introduction to the subject and a rare opportunity to cut through the hype to understand both the value of machine learning and the complications of this field of research. In our recent interview, Cotton, who now works as lead data scientist at n-Join, once again offered her clear thinking on machine learning and where it is headed.
If we believe that, indeed, “software is eating the world,” that we are living in a moment of extraordinary technological change, that we must – according to Gartner or the Horizon Report – be ever-vigilant about emerging technologies, that these technologies are contributing to uncertainty, to disruption, then it seems likely that we will demand a change in turn to our educational institutions (to lots of institutions, but let’s just focus on education). This is why this sort of forecasting is so important for us to scrutinize – to do so quantitatively and qualitatively, to look at methods and at theory, to ask who’s telling the story and who’s spreading the story, to listen for counter-narratives.
Tim O'Reilly writes about the reality that more and more of our lives – including whether you end up seeing this very sentence! – is in the hands of “black boxes” – algorithmic decision-makers whose inner workings are a secret from the people they effect.
O'Reilly proposes four tests to determine whether a black box is trustable:
1. Its creators have made clear what outcome they are seeking, and it is possible for external observers to verify that outcome.
2. Success is measurable.
3. The goals of the algorithm’s creators are aligned with the goals of the algorithm’s consumers.
4. Does the algorithm lead its creators and its users to make better longer term decisions?
O'Reilly goes on to test these assumptions against some of the existing black boxes that we trust every day, like aviation autopilot systems, and shows that this is a very good framework for evaluating algorithmic systems.
But I have three important quibbles with O'Reilly’s framing. The first is absolutely foundational: the reason that these algorithms are black boxes is that the people who devise them argue that releasing details of their models will weaken the models’ security. This is nonsense.
For example, Facebook’s tweaked its algorithm to downrank “clickbait” stories. Adam Mosseri, Facebook’s VP of product management told Techcrunch, “Facebook won’t be publicly publishing the multi-page document of guidelines for defining clickbait because ‘a big part of this is actually spam, and if you expose exactly what we’re doing and how we’re doing it, they reverse engineer it and figure out how to get around it.’”
There’s a name for this in security circles: “Security through obscurity.” It is as thoroughly discredited an idea as is possible. As far back as the 19th century, security experts have decried the idea that robust systems can rely on secrecy as their first line of defense against compromise.
The reason the algorithms O'Reilly discusses are black boxes is because the people who deploy them believe in security-through-obscurity. Allowing our lives to be manipulated in secrecy because of an unfounded, superstitious belief is as crazy as putting astrologers in charge of monetary policy, no-fly lists, hiring decisions, and parole and sentencing recommendations.
So there’s that: the best way to figure out whether we can trust a black box is the smash it open, demand that it be exposed to the disinfecting power of sunshine, and give no quarter to the ideologically bankrupt security-through-obscurity court astrologers of Facebook, Google, and the TSA.
Then there’s the second issue, which is important whether or not we can see inside the black box: what data was used to train the model? Or, in traditional scientific/statistical terms, what was the sampling methodology?
Garbage in, garbage out is a principle as old as computer science, and sampling bias is a problem that’s as old as the study of statistics. Algorithms are often deployed to replace biased systems with empirical ones: for example, predictive policing algorithms tell the cops where to look for crime, supposedly replacing racially biased stop-and-frisk with data-driven systems of automated suspicion.
But predictive policing training data comes from earlier, human-judgment-driven stop-and-frisk projects. If the cops only make black kids turn out their pockets, then all the drugs, guns and contraband they find will be in the pockets of black kids. Feed this data to a machine learning model and ask it where the future guns, drugs and contraband will be found, and it will dutifully send the police out to harass more black kids. The algorithm isn’t racist, but its training data is.
There’s a final issue, which is that algorithms have to have their models tweaked based on measurements of success. It’s not enough to merely measure success: the errors in the algorithm’s predictions also have to be fed back to it, to correct the model. That’s the difference between Amazon’s sales-optimization and automated hiring systems. Amazon’s systems predict ways of improving sales, which the company tries: the failures are used to change the model to improve it. But automated hiring systems blackball some applicants and advance others, and the companies that makes these systems don’t track whether the excluded people go on to be great employees somewhere else, or whether the recommended hires end up stealing from the company or alienating its customers.
I like O'Reilly’s framework for evaluating black boxes, but I think we need to go farther.
The futch ignores complexity. The futch denies how the internet amplifies existing hierarchies and upholds structural inequality. The futch is every broken promise of every new app or internet service. There’s always demand for more legible future. Futch-peddling is about as noble a profession as astrologer, and one with about as little accountability.
Prediction is an industry, and its product is a persuasive set of hopes and fears that we’re trained or convinced to agree upon. It’s a confidence trick. And its product comes so thick and fast that, like a plothole in an action movie, we’re carried on past the obvious failures and the things that didn’t even make sense if we had more than five seconds to think about them.
Predictive policing software packages are being adopted across mainland Europe, too. In Germany, researchers at the Institute for Pattern-based Prediction Techniques (IfmPt) in Oberhausen have developed a system for tackling burglaries. Precobs works by analysing data on the location, approximate date, modus operandi and stolen items from robberies going back up to 10 years. Based on this information, Precobs then predicts where burglaries are likely to happen next. This is tightly defined, within a radius of about 250 metres, and a predicted time window for the crime of between 24 hours and 7 days. Officers are then advised to focus their resources in a flagged area.
It was not the individual events that made 2014 so topsy-turvy: after all, what could top the 1991 Soviet collapse for sheer disruption of the status quo? The year instead was remarkable for the number of big, consequential and utterly unforeseen events—Russia’s invasion of Ukraine, the rise of ISIL, the diplomatic breakthrough between the US and Cuba, the emergence of US shale oil and the collapse of oil prices, not to mention a clutch of other economic, business and market events. All in all, it has been evident for months that 2014 was a staggering maelstrom of surprises.
Google Flu Trends, which launched in 2008, monitors web searches across the US to find terms associated with flu activity such as “cough” or “fever”. It uses those searches to predict up to nine weeks in advance the number of flu-related doctors’ visits that are likely to be made. The system has consistently overestimated flu-related visits over the past three years, and was especially inaccurate around the peak of flu season – when such data is most useful. In the 2012/2013 season, it predicted twice as many doctors’ visits as the US Centers for Disease Control and Prevention (CDC) eventually recorded. In 2011/2012 it overestimated by more than 50 per cent.
There are serious differences between predictions, bets, and exposures that have a yes/no type of payoff, the “binaries”, and those that have varying payoffs, which we call the “vanilla”. Real world exposures tend to belong to the vanilla category, and are poorly captured by binaries. Vanilla exposures are sensitive to Black Swan effects, model errors, and prediction problems, while the binaries are largely immune to them. The binaries are mathematically tractable, while the vanilla are much less so. Hedging vanilla exposures with binary bets can be disastrous – and because of the human tendency to engage in attribute substitution when confronted by difficult questions, decision-makers and researchers often confuse the vanilla for the binary.
By contrast I emphasize the role of misconceptions, misinterpretations and a sheer lack of understanding in shaping the course of events. I focus on the process of change rather than on the eventual outcome. The process involves reflexive feedback loops between the objective and subjective aspects of reality. Fallibility insures that the two aspects are never identical. That is where my framework differs from mainstream economics.
Writing down precise predictions is like spaced repetition: it’s brutal to do because it is almost a paradigmatic long-term activity, being wrong is physically unpleasant, and it requires 2 skills, formulating precise predictions and then actually predicting. (For spaced repetition, writing good flashcards and then actually regularly reviewing.) There are lots of exercises to try to (calibrate yourself using trivia questions obscure historical events, geography, etc.), but they only take you so far; it’s the real world near term and long term predictions that give you the most food for thought, and those require a year or three at minimum.