Several times I have mentioned the PolicyLab COVID-19 modeling efforts in which I have been privileged to be a part, and that I believe is contributing significantly towards efforts to try to get the virus under control. Our paper introducing the model was published in JAMA Network Open two weeks ago.
This modeling effort has been tremendously impactful in helping understand the key factors associated with COVID-19 surges, and to identify counties whose conditions suggest an emerging outbreak. These models and associated projections have generated considerable interest from media and policymakers alike, including the White House Pandemic Response Team led by Dr. Deborah Birx, who has requested to receive model projections each week. She and her team are using these projections as a primary resource in her efforts to identify emerging hot spots so she can work with community leaders to implement mitigation strategies to get the virus under control, and has more recently used model projections to assess the efficacy of her team's proposed strategies for slowing spread of the virus without lockdowns in communities where they are followed.
In this post, I will summarize the key characteristics of the model, which factors are most predictive of viral surge, and also how the model projections suggest that the viral levels can be brought under control without requiring broad shutdowns if communities would just follow a few key precautionary steps based on the guidelines we have been given for months.
This effort is led by David Rubin, director of the PolicyLab at the Children's Hospital of Philadelphia (CHOP), who has built an interdisciplinary research team comprised of researchers at CHOP and University of Pennsylvania. The statistical modeling is led by Jing Huang, a 2nd year assistant professor in the Biostatistics division at the University of Pennsylvania of which I am director (who BTW is from Wuhan, China) -- she is remarkable and her tremendous skills and drive cannot be overstated. I joined the team in late April as a collaborator to help provide additional statistical ideas, insights and perspective.
The basic idea is to build a hybrid statistical-epidemiological model to identify county-level factors associated with viral spread, and then to generate county-level projections with the hope of early identification of counties at risk of viral surge. Modeling COVID-19 cases at the county level is important since outbreaks tend to start at a local level, and state or country-level modeling is too coarse to reveal the early signs of an emerging outbreak. The fact that this is one of the few modeling efforts focusing on county-level data is one major reason for its high impact.
The modeling effort relies on various data sources, including county-level COVID-19 incidence data from the New York Times and usafacts.org, social distancing information from GPS data from cell phones compiled by Unacast, temperature and humdity data from NOAA, demographic data from the US census, and testing data from covidtracking.com. The original efforts modeled 211 counties that had sufficient cases for modeling in April, but has been expanded to included 747 counties.
First, the effective reproduction number, R, is estimated for each county and each day from the case counts. This number represents the number of people infected by each infected person in that county under conditions and practices at that time, and provides a time-varying measure of viral spread. When R>1, the number of daily infections are growing, when R~1 the number of daily infections are stable, and when R<1, the virus is in decline in that community. If the virus is left unconstrained, it is estimated that the SARS-CoV-2 will have a reproduction number of R0=3 or so, meaning it would grow exponentially with a base of 3. The key in managing the crisis is to implement community mitigation strategies to get R<1 so the viral levels decline to a manageable level where testing and contact tracing can be effectively used.
Next, a sophisticated statistical regression model, a distributed lag non-linear mixed effects model, is used to relate R to county-level factors. The key factors in the model include:
Population density, with higher density leading to higher R
Social distancing, measured by percent reduction in visits to non-essential businesses from pre-covid levels. This factor is lagged 4-14 days, accounting for the fact that cases are not counted until a positive SARS-CoV-2 test is obtained, which is typically lagged 4-14 days after day of infection. This is one of the strongest factors in the model, with more social distancing in the county resulting in lower R values, and this variable seems to broadly capture community behavior and mobility.
Wet-bulb temperature ("feels like" temperature including effect of humidity), also lagged 4-14 days. Our models have consistently found a biomodal temperature effect, with both low and high temperatures corresponding to increased R, and moderate temperatures a lower R. This effect may be picking up on community behavior, with low or high temperatures driving people to spend more time indoors where the virus spreads up to 20x more efficiently.
Testing positivity rate (rolling 7-day average), measuring the proportion of viral tests coming back positive, with higher testing positivity rate suggesting higher R. The inclusion of this factor in the model prevents it from calling surges in counties whose increasing case counts are simply due to increased testing, and also provides a key leading edge indicator of a surge, as we have observed that increasing test positivity rates are often harbingers of viral upticks visible even before it is evident from the case counts.
County-level random effect, which essentially tracks the recent trend in R values over time for that county, conferring greater risk of surge to counties whose R levels have been consistently increasing or decreasing risk if they have been decreasing.
Metropolitan area and Regional random effect, which provide a bridge among counties in the same metropolitan area or broader region. This allows the model to borrow strength from nearby counties, such that a surge in a nearby county increases the probability of a surge for a given county.
The team is always looking to improve the model, currently exploring the addition of masking data as well as adjustments for cumulative community infection levels so it can capture potential herd immunity effects as viral exposure levels increase in hard-hit areas. We use rigorous training-validation strategies to update and test the model, with details about model validation found here. For more details, an overview of the project is found here and an abstract describing the model is found here.
These factors, along with autoregressive errors that enable the model to learn from the most recent daily data for a given county, enable the the model to produce county-level projections of future case counts, which are updated each Wednesday and posted on the PolicyLab website here. These projections along with a weekly commentary are shared with the media and policymakers including Dr. Birx's team and have been a useful tool in alerting communities whose conditions are ripe for a viral uptick or surge.
This model has proven to be useful for identifying emerging hot spots. In mid-May, when Houston was only seeing about 200 cases/day, the model projected that Houston would surge in June to daily case counts over 1000/day. I wrote a blog post about this at the time, which was incredulously received by many of my skeptical friends in Texas.

Of all counties in the USA, this was the one predicted to have the steepest increase and have the greatest chance to become a national epicenter. Numerous factors contributed to this prediction, including high population density, relatively high testing positivity rate, relaxed social distancing, plus the temperature effect, as the model seemingly accounted for the fact that individuals would be moving indoors as the temperatures increased in June. Also, the county-level random effect could have been a major factor, as the R for the county had been steadily increasing since mid-April, which suggested a consistently growing level of infections. See below the estimated R over time in Harris County (Houston), and the levels that led to projected sharp exponential growth.

As can be seen by the plots below, June did see Houston surge to >1000 cases/day and become a national epicenter of the disease, although later in the month than our model projected.

This surge is also well-reflected in the spike in R that was experienced in mid-to-late June that led to a period of sharp exponential growth with a base of 2-3, nearly as high as the virus would produce with no mitigation strategies.

The model has also accurately flagged other counties around the country as emerging hot spots, some in the southern and western US that experienced surges in June-July, but also other specific counties that experienced local surges not evident at the state level.
The model is not Nostradamus -- we do not expect it to precisely predict future cases, which depend on many unknown random factors like future temperature, social distancing, mask-wearing practices, and travel patterns in the county. Its best use is to identify counties for which their conditions are ripe for further viral growth or decline, and especially to identify which are ripe to become an emerging hot spot so community action can be taken. As previously mentioned, this is how policy makers have been using the projections, which are also useful for evaluating how changes in community behavior may affect future viral levels in a community.
Birx's White House Coronavirus Task force has put together a series of community targeted mitigation strategies whose goal is to get viral levels under control without lockdowns and recently shared them with state governments. These levels all include universal mask wearing and physical distancing of 6 feet with non-household members, and three levels of mitigation strategies:
Red Zone: assumes bar and gym closures; 25% indoor restaurant capacity; gathering size limitation of 10 people; and reduced individual travel to non-essential businesses to 25% of normal
Yellow Zone: assumes bar closures; 50% indoor restaurant capacity; 25% gym capacity; gathering size limitation of 25 people; and reduced individual travel to non-essential businesses to 50% of normal
Green Zone: assumes 25% indoor bar capacity; 75% indoor restaurant capacity; 50% gym capacity; and recommended limiting gathering sizes.
You may recognize some of these guidelines as her team has had success in convincing local communities to implement some of them. BTW, side note: from what I've seen, she and her team have been working tirelessly for months in trying to identify areas in danger of upticks and surges and engaging with community leaders to urge them to take the steps necessary to get them under control. From my perspective, she is the real deal, an apolitical public servant pushing herself to the limit in a difficult work environment to make an impact in our society -- which is why I am frustrated by the recent political attacks she has sustained from both sides. Anyway, ...
She asked the PolicyLab team to produce projections of the level of mitigation these strategies might provide. We approximated these levels by assuming the social distancing levels were >50%, 40-50%, and 30-35% below pre-pandemic levels in that county. The projections under these scenarios are also provided on the website here. Following are the projections future case counts and R for Harris County (Houston) under these scenarios.


The dotted blue line is our standard projection, which assumes a continuation of current social distancing levels for that county. The solid lines indicate the green, yellow, and red zone projections, and the dotted line indicates projections assuming lockdowns were reintroduced. Note how the yellow and red zone projections indicate a reduction of R to a level well below 1 and a corresponding sharp reduction of daily case counts that if realized could get the community viral levels under control by mid-September, enabling safe school reopenings. Also, note that the "red" zone projections produce viral mitigation at levels close to full lockdowns, while assuming businesses are still kept open and people still engaging in societal activities, albeit while taking the recommended precautions.
These are only model projections, but highlight what is surely a true principle of managing this pandemic -- we CAN manage this virus without lockdowns, but ONLY with well-designed community guidelines and responsible individual embraced by the ENTIRE community.
Putting into practice what we have learned about how this virus spreads, we know enough to construct targeted mitigation strategies comprised of basic individual precautions including mask wearing, social (physical) distancing, and avoiding crowded enclosed places as well as minimal community restrictions including closing or restricting capacity in the most spread-prone settings. If we can successfully implement and comply with these strategies, most of society can operate at near-normal levels while keeping the virus under control and manageable until vaccines are (hopefully) available to finally smack it down for good.
My hope is that this modeling effort can continue to be refined, incorporating more data and important factors as they become available, and play a role in helping with this viral management.
Maybe that’s why we are flat but at a high number of cases. We are between the two zones.
This is very helpful. I would put Houston between green and yellow. Bars closed but some reopening as restaurants. Gyms at 50%. The strangest thing to me is no real defined limits on indoor gatherings. Weddings/receptions/church still held indoors. Ma