What proportion of the US population is susceptible, & what does it mean for possible herd immunity?
Updated: Oct 4
A study was just published in the Lancet on September 25 presenting results from the largest nationwide study to date assessing the prevalence of SARS-CoV-2 antibodies in the USA, giving us valuable information about what proportion of the country has already been exposed and is presumably immune from coronavirus. In this post, I will summarize the findings of this study, pointing out its strengths and limitations, and will try to assess what it suggests about the prevalence of SARS-CoV-2 infections in the USA right now in late September, and make a few comments about what we do and don't know about immunity and what this all means for the promise of herd immunity.
Warning: important point of context -- In this post, I will be mentioning some evidence supporting the potential for higher immunity levels in the USA population than many acknowledge, but I do not IN ANY WAY support the so-called "herd immunity" strategy, e.g. as seems to be promoted by the president's new advisor Atlas. This strategy fails to acknowledge the importance of basic mitigation strategies like avoiding crowded indoor areas, distancing or mask wearing, and in "low risk groups" considers distancing a bad idea that in Atlas' own words "socializing among those low risk groups represents an opportunity for developing widespread immunity and eradicating the threat". I call this the "Disney Frozen" strategy ("Let it go, let it go, don't hold it back any more ..."). I strongly believe this is a foolish and dangerous idea, would likely lead to a great deal of unnecessary suffering and death from COVID-19, would ultimately not work, and is unnecessary under any circumstances. I'll explain why in the conclusion.
I present the information in this post because I think it is scientifically reliable, and I believe that it is important to consider all available facts our attempt to synthesize information to more fully understand the pandemic. I fully support continued vigilance and the importance for all of us to continue to follow the basic precautionary steps to constrain viral spread, including (1) the avoidance of crowded, enclosed, indoor locations especially if potentially poorly ventilated, (2) physical distancing when indoors, and (3) wearing masks when around other people.
However, I disagree with the notion of not talking about certain facts because they may be misconstrued and taken too far by some with incomplete understanding -- my preference is to consider all the scientifically supported facts and try to put them in their proper perspective.
Important background: the difference between confirmed cases and infections
In order to understand the implications of this study, it is important to recognize that the official COVID-19 cases numbers are only a small proportion of total SARS-CoV-2 infections in the country. Cases are the SARS-CoV-2 infections that are confirmed with a positive PCR test, and do not include any infections for which the individual was never tested, had a false negative test result (which is disturbingly common), or possibly found positive for SARS-CoV-2 using an antigen test rather than a PCR test. The number of uncounted cases, infections without a positive PCR test, is considerable given the lack of available testing during much of the pandemic and the fact that most infected individuals remain asymptomatic or have only mild symptoms so are not likely to be tested.
The total number of confirmed cases in the USA is currently 7 million, but the number who have been infected by the virus is much larger than that. Serology studies are useful for trying to estimate this number by obtaining blood samples from the general population and using antibody tests to find evidence of SARS-CoV-2 specific antibodies that would indicate a past infection. Antibodies tend to remain in the bloodstream long after an infection is cleared. For a well-designed study whose sample is representative of the population, the proportion of positive antibody tests, or seropositivity rate or seroprevalence, provides an estimate of the prevalence of SARS-CoV-2 infections in the population.
Summary of study design and results
This Lancet study is a cross-sectional study measuring SARS-CoV-2 blood antibody levels in dialysis patients across the country. The scope and design of the study is impressive, obtaining samples from >1300 dialysis centers all over the country, and randomly sampling 28,503 adult patients from these centers for this study while stratifying by region, age, sex, race and ethnicity. This type of rigorous sampling design is the key to obtain a representative sample that allows the study to infer results relevant for the reference population, overcoming the bias inherent in most other antibody studies whose sampling was done in a more ad hoc and less systematic fashion. This rigor justifies its publication in the Lancet, broadly considered one of the top three medical journals in the world.
The seroprevalence in their sample of dialysis patients was 8.0%. They broke down these estimates by state, and found that the seroprevalence in the USA adult dialysis population ranged from 3.5% in the West to 27.2% in the northeast. Following is the figure from their paper presenting seropositivity estimates by state.
Note how high the numbers are in the northeast, Illinois and Louisiana that were hit hard by the pandemic in the Spring, and how low they are in the south and west that were hit hard by the pandemic in the Summer. More on that later.
Recognizing that the population of adult USA dialysis patients was not the same as the entire USA adult population, they standardized their results to the entire USA population by adjusting for differences in demographic factors including region, age, sex and race, between the dialysis and general population. These standardized results suggested a seroprevalence of 9.3% in the overall adult population. Comparing with total population numbers, their study estimated that 9.2% of seropositive patients were formally diagnosed by a PCR test, i.e. suggesting the number of infections was 10.9x the number of confirmed cases, affirming the 10:1 ratio of infected to confirmed cases that has been suggested in numerous other serology studies.
Unfortunately, it is clear that dialysis patients are not representative of the public as a whole.
While the demographic adjustment is nice, it is likely not sufficient, as there are other clinical and lifestyle factors in the dialysis subpopulation that likely differentially affect their risk factors, exposure, and susceptibility. They are more likely to be immunocompromised and would have frequently traveled to the hospital during the pandemic, spending considerable time indoors with other strangers while receiving dialysis. These would suggest higher risk of infection. However, being a known vulnerable group they may also be more likely to isolate indoors and avoid large crowds than the general population, and one would expect dialysis centers must have systems in place to prevent exposure to the virus while receiving dialysis . Thus, it is not immediately clear whether the dialysis subpopulation should have higher or lower incidence than the general population, but this caveat must be kept in mind when drawing conclusions about the USA adult population from this study.
Misleading headlines and inaccurate conclusions
This study has received quite a bit of attention in the news media this weekend, but as is sadly so often the case, the headlines fail to accurately portray the conclusions of the study. Not only are they not precise, they are downright inaccurate and blatantly misleading.
A vast majority of headlines portray the study as demonstrating that <10% of the USA population
has been infected, for example: "Study: Fewer than 10% of Americans have coronavirus antibodies", by Axios, "Only 10% of adults may have COVID-19 antibodies: Study" by ABC, "Fewer than 10% in the US have antibodies to the novel coronavirus" by CNN, "Fewer than 10% of Americans show signs of past coronavirus infection, study finds" by CNBC, "Fewer than 10% of US adults were found to have coronavirus antibodies, according to a new study" by Insider, and "U.S. far from reaching herd immunity, with less than 10% of adults showing virus antibodies: study" by the NY Daily News. These numbers agree with CDC direct Redfield's recent remarks that more 90% of the US population remains susceptible to SARS-CoV-2 infection, based on preliminary results of a CDC serological study, for which he mentioned that updated results will be available any time now.
The study found 8.0% of adult dialysis patients had antibodies (and thus evidence of prior SARS-CoV-2 infection) and estimated that 9.2% of the USA adult population had antibodies after adjusting for demographic differences. So on the surface, these headlines seem accurate. So why are they blatantly misleading? They speak in the present tense and don't mention the time frame of the study. Here is a plot of the distribution of dates of sample collection for this study from the paper:
These samples were collected in July, with a vast majority of them collected in the first week of the month, and the median sample date July 6th.
On July 6th, the USA had 2,932,813 confirmed cases (from covidtracking.com), and as of September 25th, the USA had 6,997,437 confirmed cases. Thus, there are roughly 2.4x more confirmed cases in the USA today than there were during the time at which the samples were taken for this study. Thus, the 8.0%/9.2% seropositivity rates reported in this study indicate the state of nature in JULY, not NOW. A few news reports (but not enough!) have included this key context in their headlines, including CBS: "Less than 10% of Americans had coronavirus antibodies as of July, study finds", US News and World Report "Study Suggests Fewer than 10% of U.S. Population Had Coronavirus Antibodies in July", and the Wall Street Journal that has an incomplete primary headline "Coronavirus Antibodies Found in Small Portion of Americans, Study Says" but clarifying secondary headline: "Less than 10% of 28,000 dialysis patients in the U.S. had pathogen-fighting antibodies as of July, researchers found."
It is clear that the SARS-CoV-2 prevalence is considerably higher now than it was in early July, especially given the massive summer surge in the southern and western USA that was of greater magnitude than the initial northeastern surge driven by NYC and its surrounding areas.
Here I consider two ways to standardize these results to provide an idea of what the current seroprevalence levels might be.
If we assume that the seroprevalence has increased at the same proportion as confirmed cases, i..e 2.4-fold from July to now, then we would estimate that at lest 19.2% of the USA adult dialysis population has been infected, and 22.0% of the total adult USA population has been infected.
If we assume that only 9.3% of infections are counted as confirmed cases as this study found was true in July, then 7 million confirmed cases would correspond to >75 million infections, which would represent 22.8% of the total adult USA population.
Like everything involving data analysis during this pandemic, there are caveats to any estimator, especially given the changing testing and reporting practices from state to state and over time. However, it seems evident that the proportion of USA adults exposed and infected enough to produce antibodies is currently far greater than 10%, and likely on the order of at least 20%.
So what about CDC director Redfield's remarks that preliminary results of their nationwide serology study found <10% seropositivity? He did not elaborate on the specifics of these results, nor the timing of sample collection. In his comments to Congress when he made these remarks on 9/23, he said the next round of results from this study would be available in a week or so, e.g. by early October. Unless the population of dialysis patients in the USA is at much higher risk of exposure such that they have double the infection rate as the general population, I expect these overall seropositivity numbers to come in far higher than 10% and likely close to 20% -- and if not, I recommend also checking the date of sampling in this study. It is possible that much of the sampling for that study was also done months ago, and some type of adjustment for increasing infections over time like I attempted here would need to be done. Stay tuned, as this study will likely much more representative and relevant to the general USA population than the dialysis study.
Updated State-by-State Estimates
Just for fun, I did an adjusted analysis based on the results of this study to produce updated state-by-state estimates of the prevalence of SARS-CoV-2 infections.
To do so, I assembled a spreadsheet based on the state-by-state seropositivity estimates from this study, testing and case data from covidtracking.com, and estimated population data for 2019. The spreadsheet contains the following columns for each state:
Seropositivity estimate for dialysis population from the Lancet study
Case counts on 7/6/20
Case counts on 9/25/20
Estimated population in 2019
From these, I produced updated estimates of seropositivity for the dialysis population on 9/25/20, and computed several other summaries:
Took ratio of cases on 9/25/20 to 7/6/20 for that state
Multiplied this ratio times the seropositivity estimate for 7/6/20 to get an estimate of SARS-CoV-2 seroprevalence on 9/25/20.
Computed the ratio of Seropositive/Confirmed case as estimate of Infected/Case ratio
I conservatively used the median Seropositive/Confirmed ratio of 5.2 across states to provide estimates of SARS-CoV-2 prevalence on 9/25/20 for states with missing or 0% seropositivity estimates in the Lancet study (Idaho, Kansas, Montana, Nevada, North Dakota, Rhode Island, South Dakota, West Virginia and Wyoming)
I also added some other information about testing prevalence (ratio of total tests to population) and testing positivity (ratio of positive to total tests) to the spreadsheet for some exploratory analyses that I will not get into here.
Here is a link to the spreadsheet:
Here are the estimates of SARS-CoV-2 prevalence percentages on 9/25/20 for the adult dialysis population based on these assumptions:
These are estimates for the dialysis population, and recall that the demographic adjustments to standardize these numbers to the entire USA adult population increased the seropositivity estimates by 15% or so (adjusting the USA estimates from 8.0% up to 9.2%) in the Lancet paper, so if demographic adjustments were applied it is likely these numbers would be slightly higher for the entire adult USA population.
Now, these numbers should be taken with a grain of salt, as they are merely an attempt to update the results in the Lancet paper based on the progression of the pandemic since July to give a general idea of what these numbers might look like now. They are based on assuming the state-specific ratios of seropositive to confirmed cases have remained constant from July to September, and there are numerous dynamic changes in the pandemic since July that have changed testing and reporting practices in many states that make the veracity of this assumption questionable From exploratory investigations, I suspect that these estimates tend to overestimate the prevalence for states experiencing surges in the spring (Northeast, Washington, Louisiana), and underestimate the prevalence for states experiencing surges in the summer after the timing of this study (south, southwest, most of the midwest), but since I don't have any more rigorous analyses done I will not attempt to further refine these estimates.
While far from perfect, these state-specific standardized estimates are almost certainly a more accurate of the current state of the pandemic than the figures in the paper summarizing estimates in July.
The notion that upwards of 20-25% of the country might have been exposed, infected and immune removes a substantial proportion of the USA from the susceptible subpopulation. However, this is not even close to the threshold needed for herd immunity, as I will now explain.
What is Herd Immunity and how close are we to it?
The concept of herd immunity was introduced in the vaccine development community to describe the mechanism by which the introduction of a vaccine gradually eliminates the potential for epidemic growth. As a higher proportion of society become immune by infection, death, or vaccine, the population of susceptible individuals to whom the virus could possibly spread decreases. This decrease of susceptible individuals suppresses the transmission rates of the virus in the population, and at some point reaches a threshold at which the virus can no longer feasibly support epidemic spread. This threshold depends on the R0 rate of the virus, which is a number indicating the average number of people infected by each infected person, with the herd immunity threshold given by 1-1/R0. For SARS-CoV-2, R0 is 2.5-3.0, and this corresponds to a herd immunity threshold of 60-67%, meaning that only 30-40% of the population remains susceptible. Thus, although the prevalence of infection in the USA is certainly greater than the 8-9% mentioned in the Lancet study, even the 20-22% updated estimate is still far below the herd immunity threshold, since it suggests 75-80% of the population remains susceptible.
However, as many things with this coronavirus, it is not that simple, and there are some caveats about herd immunity that are important to understand. Some of these caveats provide support for the notation that the susceptible population might be even smaller.
Variability in susceptibility and risk of exposure in the population can decrease the effective herd immunity threshold: The derivation of the herd immunity threshold described above is based on the assumption that a vaccine is randomly distributed in the population, and making assumptions about homogeneity in susceptibility and exposure, i.e. that individuals share the same probability of exposure and equal susceptibility to infection. If there is considerable variability across individuals with respect to their individual susceptibility or exposures, then it is possible that the herd immunity thresholds are in fact even lower, as suggested by the mathematics presented in this unpublished medRxiv article. The basic idea is that given a heterogeneous population, those who are most susceptible or prone to exposure will get infected first, while those less susceptible or prone to exposure make up a higher proportion of remaining susceptible, making them "harder to reach" by the virus and thus slowing spread. An article published in Science in August presented a mathematical model based on this idea, making assumptions of age-based heterogeneity of behavior and exposure mimicking our current society, and found that given the 20's, 30's and 40's groups are far more exposed (not just because of parties as many think but also service jobs), their model estimated the herd immunity threshold to be 43%. They warn about taking these results as truth, but they demonstrate this valid principle, and it is plausible that the effective herd immunity level is lower than under random vaccination.
The continuum from novel virus to herd immunity: Herd immunity is not really a binary state, but a continuum. While the threshold indicates the immunity levels above which epidemic spread is not possible, as immunity levels increase, the susceptible population decreases, and viral spread is suppressed. Computer simulation models demonstrate viral spread suppression starts to become noticeable when the immune proportion gets above 20-25% or so, and the suppressive effect increases from there on up to the herd immunity threshold. Thus, we could talk about the concept of partial herd immunity suggesting the susceptible population has been reduced enough to make a difference in suppressing community spread, and this could potentially be a positive factor in some places when looking ahead in what to expect the remainder of 2020 and into 2021. I will share a few other factors that can give us hope that the susceptible subpopulation may be smaller than we think and provide potential optimism that maybe the fall and winter surge will not be as bad as feared, and also that maybe a vaccine can make a dramatic difference in the first half of 2021 even if it is not perfectly effective and the proportion of society initially receiving the vaccine is not high.
The potential of "T-cell immunity": It is possible that some individuals in the population have some degree of cross-immunity from previous exposure to other coronaviruses, including SARS-CoV-1 and four other coronaviruses that are part of the common cold. While few individuals have antibodies against SARS-CoV-2 (<1%), it is possible some have a form of T-cell immunity. SARS-CoV-2 T-cells include CD8+ killer cells that directly kill virus and CD4+ helper cells that can recruit other cells to fight against the virus, either by stimulating production of SARS-CoV-2 antibodies or using other immune mechanisms. Here is an excellent primer providing an overview of the immune system and how T-cells and B-cells work. There is accumulating evidence that many healthy individuals with no prior exposure to SARS-CoV-2 already have T-cells that react to key products of SARS-CoV-2 and may confer some level of cross-immunity, with a study in Cell showing 40-60% of 20 healthy donors had SARS-CoV-2 relevant T-cells, one study in Nature showing 35% of 68 healthy donors had such T-cells, and another Nature study showing 53% of healthy donors had such T-cells, and 100% of 23 previous SARS-CoV-1 patients demonstrated T-cells 17 years after the fact. It has not been demonstrated whether or how these T-cells might confer cross-immunity to SARS-CoV-2, so any assumption that these individuals are completely immune is premature. However, there is reason to believe that some subset of the population may have some degree of cross-immunity even if they do not currently show evidence of SARS-CoV-2 antibodies in their blood, and this may confer immunity or at least condition them to less severe disease. BTW Here is a great article by two of my colleagues at University of Pennsylvania discussing the role of T-cells in COVID-19 immunity.
Another interesting point on T-cells: A recent report in Cell found results that suggest that serology studies may substantially underestimate the proportion of individuals in the population who have been exposed to SARS-CoV-2 since many demonstrate T-cell response without evidence of antibodies. They found that ~93% of exposed asympotomatic (people who had close contact with confirmed cases but showed no symptoms) demonstrated detectable T-cell response to SARS-CoV2 even though only 60% were seropositive. They emphasize it is not clear whether these people have immunity against infection, and it is not clear whether these asymptomatic people can potentially spread the virus or not, so it would be premature to presume these individuals are immune. But they might be, and this might provide a partial explanation for why a subset of infected individuals so quickly clear the virus with few symptoms.
What about re-infections? Do these mean that immunity after infection doesn't last long? There have been recent reports of upwards of 10 individuals demonstrating evidence of second independent infections on SARS-CoV-2 several months after recovering from a previous infection. In these cases, genetic sequencing of the first and second viruses have confirmed the second infections are indeed new infections, not residual virus from a previous infection, since they are genetically distinct. Thus, for these individuals, it is clear that their initial post-infection immunity was lost after about 3 months. Can we expect that it is typical that immunity "wears off" after just 3 months? This is a terrifying thought, since that would make herd immunity either by vaccine or natural infectious spread near impossible. While it is clear that some individuals may become susceptible to reinfection after 3 months or so, it is almost certain that this is a very rare occurrence. If true for most individuals, we would have expected to see many orders of magnitude more reinfections than the 10 or so we have seen in the world. For example, there were roughly 1 million Americans infected and recovered by May 1st. The total cases on August 1st was 4.6 million, and on September 25th about 7 million, meaning there have been 2.4 million confirmed cases since August 1st, 0.7% of the USA population. Suppose half of those recovered by May 1st, 500,000, were susceptible to reinfection on August 1st, 3 months later -- we would expect 0.7% of them to become infected since August 1st, which would be 35,000 reinfections in the USA alone. While it may be true that reinfections are more likely to be asymptomatic and less likely to be detected as confirmed cases, if there were 35,000 reinfections we would expect to have seen far more than the <5 reinfection cases we have seen in the USA. Thus, it is a near statistical certainty that the proportion of recovered individuals susceptible to reinfection within 3 months is exceptionally small.
In conclusion, there are numerous reasons to believe that it is possible that substantial proportion of the USA population has been exposed and infected or otherwise has some degree of immunity to the virus at this point in time. We are not anywhere near herd immunity yet, but it is plausible that these levels are enough to at least partially suppress viral spread, especially in regions hard hit in the spring or summer. This provides some reasonable hope that we may be pleasantly surprised and any fall or winter surge might not be as bad as expected.
The human immune system is complex, and there is still a lot to learn about how it is operating with SARS-CoV-2. While there is evidence of a substantial proportion of individuals carrying T-cells responding to SARS-CoV-2, one should not consider this proportion to be additive with the estimated number of seropositive towards herd immunity. We currently have no evidence that the presence of these T-cells confers immunity from infection or becoming infectious and spreading to others -- it is plausible that these T-cells simply destine the individual to faster immune response and less severe disease, not remove them from the susceptible subpopulation or keep them from infecting others if they are exposed. Further study is needed to characterize the effects of these T-cells on infection rates and spread.
We are uncertain about how long post-infection or post-vaccine immunity lasts, and further investigation is needed to characterize this important factor, as it impacts all of our mitigation strategies as well as how well vaccines might work and on what schedule they will be administered. It is wise for infected individuals to continue following precautionary guidelines, not ignoring them with thought that they are immune, since it is not clear how long immunity lasts. In spite of anecdotal evidence of reinfections within 3-4 months, it is likely considerably longer than that based on a statistical assessment of expected number of reinfections. Most vaccine development efforts focus on an annual schedule, and at this point there is little reason to expect that the could not be effective.
It would be foolish for any community to presume that any potential immunity will suppress spread enough that they do not have to follow basic precautionary steps to limit spread. There is still considerable risk of surge, and numerous reasons to expect it to get worse in fall and winter than it was in the summer. The "herd immunity" strategy is still reckless, and if followed would very likely to lead to a great deal of needless mortality and morbidity, especially given reports of serious long-term cardiac side effects in many recovered, even among those who remained asymptomatic or had mild symptoms. It is especially foolish given the very real potential of a working vaccine to be introduced this winter that could get us to herd immunity without the extra mortality and morbidity. By the way, I think that characterization of Sweden following the "herd immunity" strategy is not accurate -- true they did not impose state-mandated lockdowns but encouraged numerous mitigation strategies that acknowledge the importance of slowing spread of the virus and trusted the people to follow them -- the true "herd immunity" or "Disney Frozen" strategy proponents don't urge people to follow these mitigation strategies or acknowledge the importance of doing so.
We should remain vigilant and cautious, but we can keep our eyes opened as we learn more about the durability of immunity, potential cross-immunity, and watch the levels of viral spread, hospitalizations and deaths as we move into the fall and winter. While we shouldn't take the optimistic side of uncertainty on all things, which leads to "dangerous optimism" that may result in foolish and damaging behavior, we also need not take the pessimistic side of uncertainty on every possibility. Some may think this is the responsible approach, but I think it is better to consider what the range of possibilities are plausible based on current evidence, and think clearly about these and update as more information accrues.
There are some solid reasons for potential optimism. Maybe it won't be as bad as many are predicting. I am hopeful.