A few months ago I wrote a series of blog posts on what is known about the limits of human prediction. Immediately after that series concluded, I flew back to Kuching for the Chinese New Year holidays … and a novel coronavirus outbreak spilled out from China and over to Singapore, Malaysia, Thailand, and Vietnam.
I didn’t know much about coronaviruses, and I knew nothing of viral epidemiology — at least, not before COVID-19 appeared and I began reading up a storm. But in the last days of January, I began to think that this outbreak would be a good opportunity to practice some of the forecasting techniques from the literature that I had covered at the end of the 2019. I started making forecasts on the shape and duration of the pandemic.
To my surprise, this quickly became more than a recreational exercise.
I’m writing this from my hometown of Kuching, Sarawak, where I have not left since the Chinese New Year. In theory, I could travel back to Singapore and Vietnam. But my aging parents urged me to delay my flights, in order to wait out the worst of the pandemic. (Kuching is a relatively isolated city; it emerged from the 2003 SARS and 2009 H1N1 epidemics unscathed). I decided to go with their wishes, partly because I knew I could work from anywhere in the world, and partly because I didn’t want to fight them on a risk that — at the time — we knew very little about.
So my forecasting activities suddenly took on a higher priority. In the early days of the outbreak, I wanted to forecast the likelihood of containment. As the weeks passed, I focused my research on a prediction for a pandemic peak. My goal? I wanted to know when it was ‘safe’ to purchase a ticket back to Singapore and Vietnam.
I want to talk a little about what I think I did well, and what I think I did badly during this period. This should be of interest to you if you’ve followed along with my series late last year, and if you’re interested in applying the same analytical techniques to your career.
I should clarify: this is not a post predicting outcomes for the current COVID-19 pandemic. I am not a superforecaster; my predictions should count for very little. Instead, this is an accounting of lessons I’ve learnt and mistakes I’ve made while putting the superforecasting process to use on a highly uncertain event in 2020. It is written with the practitioner in mind.
Superforecasting-style tournaments enforce a particular style of forecasting. You cannot say “I think there’s a good chance the virus will decimate China”, or “I think Kuching will escape the pandemic” — you must use precise yes-or-no statements with zero ambiguity and clear deadlines. I’ve covered this in detail in my summary of Phillip Tetlock’s book, but the short of it is that you must stick to a few strict rules, or run the risk of lying to yourself when forecasts turn out differently from your expectations.
Here were the forecasting statements I picked, on January 31:
- China will hit at least 8000 cases by the end of February 2020.
- The novel Coronavirus will be under control — that is, have zero new cases — by the end of April 2020.
My initial forecast on January 31 for (1) was 80%, and my initial forecast for (2) was 75%.
It’s almost comical how badly I assessed the overall situation. On the 31st of January, China’s case count stood at 5947. A few days later, it became clear that the case count would exceed 8000. So I scored a good win on my first prediction, given that I had predicted it would come true with 80% confidence.
But it turned out that I got that prediction right by pure chance. I was comparing the new outbreak with SARS — which was contained within two months of its initial discovery. I thought that this new virus would exceed SARS’s total case count of 8098 — but not by much. (Remember, in late January, the vast majority of informed reports were comparing it to SARS, and early statistical analyses of the Wuhan outbreak put the basic reproduction number, R0 at the low end of 2.x; SARS had an R0 of 2-4)
As of today, China’s case count stands at 80,739, which is a ridiculous order-of-magnitude difference compared to SARS's total case count (8098) and MERS (2000+). My win on the first prediction was for the wrong reasons. I should’ve picked a different, more revealing statement to forecast on. If I had said “10,000 or 20,000 by the end of February” — well, that might have been a more illuminating question to answer.
So, first lesson: pick your forecasting statements carefully. I think this is obvious when you think about it, but it isn’t as clear when you're reading Tetlock’s book. In the Good Judgment Project, Tetlock and his collaborators picked forecasting statements for tournament participants to predict. In the real world, you would have to pick forecasting statements on your own. The format of these statements matter a great deal to the thing you’re trying to forecast for.
My second prediction (“the novel Coronavirus will be under control — that is, have zero new cases — by the end of April 2020”) — fared far worse. My initial estimation was 75%, based on a comparison to SARS. A week later, I lowered it to 70%. Two weeks after that, I dropped my estimation to 30%, and then later to 27%. I’m likely to get a terrible score on this; COVID-19 outbreaks appear to take about two months to control from the date of first case detection in a population — and it seems to just be getting started in the US and elsewhere.
What I Did Well
Here are a few things that I thought I did well on:
I checked base rates for everything — The literature on forecasting is clear on this point: whenever new information appears, calibrate that information by checking against a base rate. For instance, a few weeks ago, an article appeared in the New York Times asserting that people could be reinfected from the virus. If you were reading that piece, how would you react?
My Twitter feed had a collective freak-out over this article. One person I followed even went so far as to say that “COVID-19 could be the worst epidemic in human history”. But I remembered what Tetlock and company had cautioned in their work: it was far more important to check the base rate for the thing (in this case, coronavirus reinfections) than it was to check for the details of the New York Times story. In judgment and decision making, this is often called ‘taking the outside view’.
So: how many coronaviruses could reinfect you immediately after recovering from an initial bout with the virus? The answer: absolutely zero. What is known is that coronavirus resistance generally lasts for a few years — which means you could, like the seasonal flu, get reinfected a couple of years after you first fell sick. But the odds of being instantly reinfected by the same coronavirus was very, very low (I would put it at under 5%). More importantly, extraordinary claims like this had to be backed by extraordinary evidence. A New York Times article based on field reports didn’t meet this bar. So I ignored the piece, left my estimations as they were, and waited for further evidence.
This turned out to be the right thing to do. The consensus now appears to be that these cases were mistakenly cleared of the virus, but were actually still infected. I gave myself a pat on the back, and moved on.
I evaluated my beliefs on a percentage scale … but found this difficult to do — Psychologist Amos Tversky used to say that left to themselves, most people have only three notches in the probability scale in their heads: ‘gonna happen, not gonna happen, and maybe’.
As it turns out, this is an incredibly human thing to have. Throughout the coronavirus outbreak, I forced myself to grade my beliefs on a percentage scale (“The coronavirus is confirmed to spread by asymptomatic individuals by end March 2020”: 77%). But in practice, my brain would map these percentage numbers into ‘gonna happen, not gonna happen, and maybe’. And then I noticed that my behaviour would invariably be governed by some form of ‘maybe’ × ‘how bad?’ = ‘oh noes’.
This is really, really difficult stuff, and I don’t know what to do about it just yet. But I’m rather pleased by my cultivation of this habit: I think forcing myself to come up with these probability ratings makes my inchoate thoughts more rigorous. The rigour comes from the fact that I am now able to criticise the strengths of my own beliefs — and allow others to do the same.
I picked a community of scientists to follow on Twitter — This was probably the lowest-effort, highest-return thing I did. (I originally wrote about this in a Twitter thread).
I think that if you don’t want to do the difficult work of calibrating your opinions on the epidemic, one hack you could do is to find a community of scientists on the internet, and follow their thinking over a period of days. Domain experts are likely to be better calibrated in their responses to a rapidly developing situation. They also tend to link to papers and articles with a healthy dose of scepticism, and are overall better informed than the vast majority of people — journalists included.
Following them has had one other benefit: it helps in blocking out the noise from loud investors, mathy engineers, and that one dietary epidemiologist who presents himself as a virus expert, but is really using the outbreak to further his career.
I recommend Ferris Jabr’s Coronavirus list, but there are probably others.
What I Did Badly
The worst thing I did was picking the wrong frame of reference at the beginning of my forecasting activities:
I chose to compare to SARS — and stuck way too long to that frame of reference — In late January, the vast majority of comparisons were with SARS and MERS, which made sense: these were two coronavirus outbreaks that had sparked an international pandemic response in the past.
But as the weeks passed, and as COVID-19 cases began to outnumber SARS and MERS case numbers, it became clear that the better comparison was with the 2009 H1N1 outbreak. Like COVID-19, the H1N1/09 virus had a higher infection rate and a lower mortality rate when compared against SARS or MERS. And unlike SARS or MERS, H1N1/09 took nearly a year to get under control.
My mistake was to hold on to the SARS and MERS comparison for far too long. I should have considered all outbreaks, instead of focusing on coronavirus outbreaks on the outset. It took me until late February before I began reading up on the H1N1/09 outbreak timeline. At the same time, I began reading up on other flu pandemics — including the 1918 ‘Spanish Flu’ influenza pandemic, but quickly set it aside. That pandemic was made worse by trench warfare as a result of World War I; it also occurred before knowledge of viruses and the invention of vaccines. It made no sense to use it as a frame of reference.
I stubbornly stuck to R0, an incomplete metric — This mistake was due to my lack of knowledge in viral epidemiology. Early on in the outbreak, Twitter freaked out over COVID-19’s basic reproduction number, or R0 (pronounced ‘R-nought’). Aforementioned poser-viral-epidemiologist Eric Feigl-Ding made a big fuss over COVID-19’s estimated R0 of 3.8, which led to a viral (heh) spread of obsession over the R0 number.
The R0 metric can be thought of as ‘the expected number of cases directly generated by one case in a population where all individuals are susceptible to infection’. As an example, SARS has an R0 of 2-5 in a naive population, measles an R0 of 12-18, and so on. But note that R0 is a statistical measure given a model of spread in a population: this implies that it can be brought down through mitigation and quarantine efforts. By the end of the SARS outbreak, for instance, the estimated R0 of SARS was a measly 0.4 due to strict government efforts (source).
So far so good. But it is incredibly easy to miss out on a more important nuance about the R0. Put simply, R0 by itself cannot tell us the speed and extent of the spread. Consider this factoid: COVID-19 has an expected R0 number of 2-3, with most estimates putting it around 2.3. SARS has an R0 number of 2-5. And yet SARS was halted within two months, at 8000 cases and 774 deaths, while COVID-19 continues to spread today, and currently counts 114,458 confirmed cases (with 4027 deaths) as I write this on the 10th of March 2020.
(To understand why this is the case, read Wilder-Smith, Chiew & Lee, 2020 in The Lancet, which contains a remarkably easy-to-read and easy-to-understand explanation of SARS vs COVID-19, and the differences in the two outbreaks).
At any rate, I did not know about these nuances when I first started. I think I spent over a month obsessively checking papers for the estimated R0 numbers for COVID-19. This was an absolute waste of time.
What could I have done differently? If I had the chance to do this again, I would have read up on the nuances of each of these metrics, in order to familiarise myself with the concepts so commonly used in each of these papers. I should have read textbooks, or contacted experts for clarification on basic concepts. I think this is a perfect example of ‘going slow to go quickly’; I could have saved myself a ton of time by doing my research before doing my research.
Earlier today, Phillip Tetlock tweeted the following link:
These couple of months forecasting the coronavirus pandemic has left me intrigued. I’ve applied to the program, and I think if you’re interested and can spare the time, you might want to give it a go as well.
Ultimately, my forecasting practice is for very personal ends. This is as it should be. The net result of all this work is that when someone asks me — as a friend did, last week: “when do you think you’ll fly back to Singapore?” I answer: “I think there’s a 70% chance I’ll be back in the first week of April.”
And that feels a lot better than I thought it would be.