predicting the pandemic

At the end of September, Ontario premier Doug Ford was predicting a second COVID-19 wave. “We don’t know how bad it will be,” he said – but Stephan Jou had a pretty good idea.

The data scientist, who specialises in using artificial intelligence for cybersecurity, had developed a tool with his team to predict the number of new COVID-19 cases and deaths. He had the numbers for Toronto, and they were grim.

“We saw the second wave coming,” he told us on October 8. “Two weeks ago, we predicted that we were about four weeks away from hitting the same death rate and case rate as New York at its peak, which was very scary.” At the time of writing, the AI-powered tool predicts that Toronto will hit NYC’s peak in 20 days.

Jou is the CTO at Interset, an AI development company that Micro Focus acquired last year. Based in Ottawa, the company uses machine learning in a behavioural analytics solution that analyses employee behaviour. Jou, a former employee at IBM and Cognos and advisor at Canada’s Natural Sciences and Engineering Research Council, has also consulted with the Privacy Commissioner of Canada on the regulation of AI for data privacy.

Predicting the pandemic

The system, which Jou will launch officially at his SecTor talk on October 21, charts COVID-19 cases and deaths across Canada and the rest of the world. It also goes further by using AI to predict what will happen next.

“There are trackers and sites out there but they show you what happened in the past,” he says, calling them pretty, but not predictive. Those tools that do offer predictions focus on state-level US data and there’s little on offer to predict provincial pandemic data north of the border.

“So we wanted to build something that was predictive, and beautiful, and Canadian,” he adds.

He and his team did this in their spare time, taking evenings and weekends to build a user interface that looks both to the past and the future. You can use the free site to analyse new case data and death figures from the novel coronavirus across the world.

You can compare that data across regions over time, either via absolute numbers or adjusted for population (on a per-million basis). That’s important when comparing cases in, for example, Canada compared to the US, which has 8.5 times more people.

The site displays graphs on a logarithmic or a linear scale. The former is especially useful for looking at exponential growth because it takes proportion into account when documenting numbers that rise rapidly such as pandemic infections. These are determined by a reproduction factor commonly known as the R score. An R score of 15 (which is what measles has) means that every infected person will infect 15 others on average. An R of one means that you’ll only infect one other person.

“The only number in which it is flat, is when R is exactly equal to 1,” Jou says of the infection rate. “Even if it’s .00001 greater than one (1.00001), it’s still going to grow exponentially.”

The Interset team’s tool quantifies that growth using statistical algorithms to chew through current and historical data, projecting likely and possible outcomes two weeks into the future. It shows a projected average line along with best- and worst case thresholds. At the time of writing, Ontario’s daily case numbers per million are still trending up, looking set to double within two weeks in the worst case analysis. Daily deaths, too, are projected to rise.

To make the assessments, the tool draws on data from John Hopkins University, which in turn gathers data from public health sources around the world. It also gathers data on changes in public policy that address the pandemic. These come from various sources. Oxford University provides country-level intervention data, while the Community Commons COVID Analysis and Mapping of Policies (AMP) database offers sub-national data. Finally, the Interset team wrote its own crawlers to harvest intervention data from governments across Canada.

The team doesn’t do anything manually to tweak the numbers based on intervention data. It doesn’t need to, explains Jou. If government policy (or lack of it) changes the infection or death rate, it will show up in the statistics within 14 days, and the statistical algorithms will incorporate the data into its predictions.

How to apply AI to security

What does all this have to do with security? As a data scientist, Jou sees several principles that should guide the application of AI across any discipline, including the digital tracking of virtual pathogens, not just physical ones.

He’ll be covering these principles in more detail during his talk, but they have one thing in common: they advocate for the responsible use of AI. That means ensuring that you don’t oversell it, focusing instead on realistic outcomes rather than fancy terminology.

That in turn means using the right tools for the job. Is the output from your tool’s AI understandable enough to explain why you used it to take legal action? Don’t be dazzled when a vendor tells you that a tool uses deep learning, Jou advises – it isn’t always appropriate (and he didn’t use it for this project). Ensure that your security team has enough technical training to understand the underlying mechanics of a vendor’s AI.

Jou is an AI expert who has proven his skills with a timely and useful tool. Security pros wanting to cut through the AI snake oil should take in his talk at SecTor’s virtual event later this month. It will demonstrate what the technology can do when used wisely, and explain more about how to evaluate AI claims for yourself. Register here.