An attempt at careful estimation of infection and mortality rates of COVID-19 in NY

Worldometers.info that provides real-time statistics about world population and other things put up a description of how they are estimating the true infection mortality rate for SARS-CoV-2.


It is surprisingly difficult to estimate the true infection mortality rate (IFR = proportion of individuals infected with SARS-CoV-2 who die of it) because of the incompleteness in the data.


Taking official deaths/official cases as the estimate is hopelessly biased and naive and overestimates the death rate by a factor of at least 5-10x.


Some of the incomplete data factors include:


1. Incomplete accounting of cases: the "official" incidence counts we see online are confirmed counts from individuals testing positive on viral tests. Given that testing is so incomplete -- almost no asymptomatic or infected with mild symptoms are tested, and even many with moderate or severe symptoms are not tested because of limited testing accessibility -- this number is an extreme underestimate of true cases of SARS-CoV-2 infection. We can estimate the true case right using antibody tests, with the caveat that we (1) adjust for false positivity rate of test, and (2) consider whether the sampling design in the study is likely to be representative or biased.

This website uses NY data to estimate that the true case rate is 10x the official count, and I have found that ratio to seem about right in many other places. They adjust based on serology data and seem to do an effective job of adjusting for this factor.


2. Incomplete accounting of deaths: The official death rate is also an undercount. If left to include only those with confirmed viral tests for SARS-CoV-2 and death that seems directly related, this will certainly be an undercount. In most of the USA (but not in many other countries), we have tried to improve this by including "likely COVID-19" deaths if the individual's symptoms and death cause seem directly related. In many cases there are hallmark indications of COVID-19 (e.g. given chest Xray showing damage to both lungs and on outside boundary of lungs) that make this ascertainment easy. I expect in most places, this is done carefully, and in NY they keep confirmed and likely death counts separate so if you are skeptical you can consider both (e.g. if 90% of deaths are "likely" CoV related and not confirmed you might not believe it). In NY, about 70% of reported COVID-19 deaths are confirmed so the likely death counts are likely accurate. (i.e. get off the conspiracy speculation -- those who believe the government is purposefully inflating death numbers to make it sounds like covid is worse than it is -- this is not going on, at least not on a widespread scale).


Also, even these measures undercount deaths, and analyses looking at "excess deaths" relative to this time last year and adjusting for other factors suggest another increase of 30% or so in deaths. They add that number into their calculation here, This may seem controversial since it would include deaths indirectly related to covid but more directly related to lockdowns (e.g. failure to promptly seek medical care for other conditions because of fear of going to hospital, suicides perhaps related to lockdown-exacerbated depression etc.), but at the same time there are many deaths prevented by lockdowns (e.g. from car accidents, from flu that would have spread more without lockdowns, etc.) that may at least cancel those out for now.


Thus, I think they generally do an effective job of adjusting for this factor, which estimates the true covid-related death rate is about 2x the official confirmed covid death rate.


3. Incomplete accounting of recoveries: Even if the case and death counts are done perfectly, it is still difficult to get an accurate measure of the IFR because the deaths often occur weeks or even months after infection. Thus, at any given point in time, an accurate death count/accurate case count will underestimate the death rate because some proportion of those alive and infected will die of the disease in the coming weeks. In this worldometer analysis, they compute the infected fatality rate by taking the ratio of # confirmed recoveries/(# recovered + # dead) to adjust for this bias. They estimate the # of recoveries based on the antibody study that found 19.9% of NYC residents had antibodies for the virus, estimating then that 19.9% of the city population had recovered. This is reasonable, but since in that study they did not do a follow up viral test, a percentage of these may be actively infected, and some proportion of those may eventually die of the disease so it may be a slight overestimate.


Based on these numbers, they estimated in NYC the IFR=1.4%. This is considerably lower than the initially reported 3.0% case mortality rate based on Wuhan data. They also estimate the proportion of the total population who in fact have died of covid, the "crude mortality rate" (CMR) which they find to be 0.28%.


They also start to break down the IFR and CMR by age groups -- looking at the population under 65 (including those with and without preexisting conditions that raise risk of death), the CMR estimate is 0.09%, which is about 1/3 of the population rate. This would suggest the IFR=0.5% or so, or 1/200 case death rate for those under 65. They promise to do more analyses by age group and by comorbidity status, which should provide a more clear picture of the IFR and CMR for different age groups.


The later posts on this worlkometer website show previous estimates based on Wuhan data, and the analysis for NYC does an immensely better job of adjusting for the inherent missing data and biases to get a more accurate estimate of the case mortality rate.


©2020 by Covid Data Science. Proudly created with Wix.com