58 cents or $200 – how much does a lost record cost?

Just how accurate can you be when estimating the cost of a lost record in a data breach? A spat between Verizon and data breach expert Larry Ponemon has thrown the question into sharp focus.

This figure is important to CISOs. They have to understand the impact of a cybersecurity risk in dollar terms when conducting a risk analysis. Getting an accurate number is important to the validity of their model. The model will be used to prioritise cybersecurity spending in different areas, based on which threats could bring the company down. They have real impact.

The Ponemon Institute makes its money estimating the cost of data breaches, and its founder has based his reputation on it. He speaks to companies and uses cost accounting to find his figures, and produces regular reports on the topic. But a recent Data Breach Investigation Report by Verizon criticised his figures, suggesting that the model Ponemon used to value data breach losses was off.

The report suggested that it had a more accurate approach to Ponemon’s figures that is “based on actual data and considers multiple contributing factors (not just number of records).”It used data from cyber insurance carriers, provided by NetDiligence. The data set contained 191 insurance claims, which Verizon ran through a ‘cost per breach’ model which it attributed to Ponemon. It arrived at a cost of 58 cents per breach using this method, and argued that this was too low, suggesting that a different methodology was necessary.

Verizon posited a “log-log regression model,” although didn’t give details, instead promising more information in the coming year. But it produced a graph to prove that its model was better than Ponemon’s, by showing lines against a list of the insurance claims, plotted against an X-Y axis.

It used R2, which is a statistical measure of how close the data are to the fitted regression line, to see how well its model fit the data. An R2 score of zero would show no correlation whatsoever between a statistical model and the data that it was trying to describe. An R2 score of 1 (which is practically non-existent in statistical studies) would show a perfect fit.

The secret model that Verizon used to describe the data losses from the 191 insurance claims had an R2 score of 0.537, meaning it only describes about half of the total variance in the data. That means that some things were affecting the cost of a breach that were not showing up in its model. So the firm created an estimate with upper and lower boundaries for expected losses based on the number of records lost in a breach.

In other words, Verizon’s model didn’t capture all the costs of a breach, but it still thought it was better than Ponemon’s. “Who wants a weak model that spits out a number that is all but guaranteed to be wrong? For that, you can just use a pair of D20 risk dice,” said the Verizon report. Ouch.

As demonstrated in his blog post, Ponemon is not happy. Firstly, he told SecTor, claims data isn’t a great way to measure breach costs, because claims don’t always cover the entire loss – especially in the case of larger breaches.

“Target was lucky enough to have $100m of data breach coverage,” he said. “Their total liability is estimated to be over $1bn. Therefore, if they’re lucky, they’ll get so they will get 10 cents on the dollar. If you rely on a claim, it would make it seem as though the total cost of a data breach for Target is $100m.”

His other gripe is that the R2 evaluation that Verizon did isn’t consistent with the data set that they were using. 

“You can do all of that, if you believe that the underlying data you collected was obtained through scientific sampling. That’s called a parametric statistic,” he said.

Scientific sampling as found in surveys, for example, selects a random base of people to represent an even distribution, but Ponemon argues that the kind of data sources for breach cost analysis can’t be random. Ponemon interviews companies that it knows, because it has to gain their trust, he argued, adding that 191 insurance claims records aren’t a parametric sample either.


“We don’t do R2 analysis because it’s not scientifically generated,” said Ponemon, adding that neither should Verizon.

Verizon defended its report. “We looked at the data for signs of “ceilings” (claims hitting an upper limit) but didn’t notice any consistent trend, (but that doesn’t mean it isn’t there),” a spokesperson said. “And the distribution of loss didn’t exhibit any signs of artificial skew.”

It’s difficult to analyse Verizon’s model until the firm releases more details about it, but the firm suggests that its model is better able to predict the losses from small and large breaches, in addition to mid-sized ones.

Conversely, the Ponemon Institute deliberately restricts its model to companies that experienced between 10-100,000 breaches. “We do that because mega breaches are rare events and they pull up the average and they potentially distort your model,” Ponemon said.

He added that the cost per record typically falls during larger breaches because the fixed costs – hiring lawyers and digital forensics experts and so on – is spread over a larger number of records.

“A data breach of 50,000 records might be on average $200 per record. By the time you’re looking at millions of records, that same type of data breach to a large data set could be as low as $17 per record,” he said.

So, where does this leave companies trying to predict the potential cost of a data breach? CISOs may have more to discuss after they actually see Verizon’s model, whereas Ponemon’s is a known quantity. And Ponemon’s concentration on mid-range breach sizes seems to have validity even in Verizon’s model.

“Their model has a range from very low to very high, and our number would be in fact less than the very high range,” he said. “If you take our numbers, from our report, it’s basically within their confidence interval. In other words, it’s a number that they would predict.”

If you’re not planning on losing any more than 100,000 records, then, Ponemon’s figures can help you – and he’s not claiming to advise you if you lose more than that. But no matter how many records you lose, your mileage may vary when it comes to losses, thanks to a number of factors.

There are, after all, costs that may vary between specific companies, warns Alexander Rau, Symantec Canada’s national information security strategist. Will you have to replace your CISO if you get breached? How long will that take and how much will it cost? Will a regulator fine you, as it did AT&T recently? Are you looking at a class action suit? All of these factors and more will shift the cost per record.

“As difficult as it is, there are tools out there to help you with that number, but take it with a grain of salt,” says Rau. The best reports will be good indicators, but you may still have to allow for some wiggle room. “Try with the best of your abilities to guess as close as possible,” concluded Rau, “but you will never hit it.”