top of page

Texas, we have a (data) problem.

Here is a recent report talking about some of the data recording problems in Texas, which has been a mess for a long time.

The gist of this problem is that during the massive viral surge starting in June, there was a huge surge in covid testing results generated by the laboratories. These laboratories did the right thing in forwarding these results to state and local health departments for official recording, but unfortunately many of these offices were not equipped to process the volume of test results coming in, having to hand process results and processing only a fraction of those coming in each day. This is understandable as the system is not built to handle a pandemic like this, much less a massive out of control surge like occurred in Houston, Dallas and then other cities in Texas this summer. This led to a backlog of test results that were not yet recorded, with this backlog reaching more than half a million test results by the end of July and increased to as high as 850,000. Contributing to this backlog was a series of coding errors from three private laboratories doing large numbers of tests. These are all real cases and tests that were not being counted in the official numbers, which is staggering given that it means that the huge surge in cases we saw in June and July in many Texas cities was actually a substantial undercount of the actual official PCR-confirmed cases, and in fact the surge was far worse.

They have been working hard to catch up and get these cases recorded, and when they get caught up it will give a better picture of the Houston pandemic. In August, there were reports that the state had caught up on the backlog, but because different local health departments were reporting their numbers in different fashions, it has been an extended process to sort all of it out. This article suggests that as they get these backlogged results they are dating them at the day they are recorded, not the day the test results were sent by the laboratory or the date the samples were taken. This has the effect of increasing the cases now, and decreasing the cases in the past, both underestimating the magnitude of the summertime surge as well as attenuating the magnitude of potential decrease from that surge and over inflating the current numbers.

This practice will tend to

  1. Increase current weekly average cases counts and

  2. Increase testing positivity rate (since this was likely higher during the massive out of control surge period).

This is a serious concern as these are the primary measures being used by schools and others in determining "safety" for reopening.

They really should try to date the cases by the date the sample was taken or at least when the results were obtained at the laboratory, which would add these cases back closer to the time of the infection and give a better picture of the pandemic over time. I hope they think to do that. This is very important.


It is plausible that many of these data artifacts were in the August data, and have mostly cleared by now. This could be one reason for the alarming spike in test positivity rate in August that seem disproportionate to what looked like declining numbers in hospitals and testing stations. Also, it is important to show care in how these backlogged data are recorded and accounted for, especially since there appears to be no standardization in the state so the various municipalities are making their own interpretations. For example, if one county goes in and removes the backlogged cases from their daily/weekly counts because they reflect a previous time point, but do not also go back in and remove the backlogged testing numbers from their daily/weekly counts, then this could artificially bias the testing positivity rate lower. That is, the testing positivity rate is the cases divided by total tests, and in that case they would be correcting the numerator for the old cases but not correcting the denominator for the old tests. Care must be taken to get these calculations right, as they are being used to make school decisions that impact the health and welfare of Texas children, teachers and staff. Careful thought producing clear consistent guidelines on how to account for these issues should be provided top down to ensure consistency and accuracy.

But another unrelated problem in Texas is also leading to undercounting cases -- Texas is not counting antigen tests in their cases counts. Because of time delays in processing the PCR viral tests that are the gold standard for determining SARS-CoV-2 infections, there is increasing use of antigen tests that give faster results, often within less than an hour, and do not require sending out to laboratories for processing. Like PCR tests and unlike antibody or serology tests, these antigen tests measure active virus, so positive tests should be counted as cases unless a follow up PCR test has been done in which case the person should not be double-counted. The antigen tests have very low false positive rates, so are clearly real viral cases ... their potential weakness is higher false negative rates so they may miss more true cases than PCR tests would.

Texas has not yet adopted the August 5th CDC guidance on how to determine cases. This guidance states that positive antigen tests be considered "probable cases" and reported in the official numbers. It seems Texas is not reporting these as cases in their reporting and these are not being counted on the official case numbers. As the exceptionally slow processing time of PCR tests in many places in Texas (sometimes >2 weeks), many places in Texas have been moving towards using these faster antigen tests more and more. This is leading to a substantial and growing undercount of cases even if they can get the old backlog cleaned up. This needs to be fixed right away, as the new Abbott BinaxNow antigen test is about to flood the USA market with 150 million antigen tests this fall. If these are not counted as cases, we will completely lose touch with the growing or shrinking nature of the pandemic and be "flying blind".

BTW, these undercounts from backed up tests and failure to count antigen tests do not just affect the case counts, they also affect the Covid-related hospitalization numbers since hospitalized patients whose test results do not arrive while they are in the hospital are likely not counted as Covid-related hospitalizations, and deaths without positive test results will not be counted as confirmed covid19-related deaths (although they could be considered "probable covid19-related deaths" in some cases.)

Right now, Texas' data is the least reliable in the country and needs to get cleaned up. Other states have problems, as well, but not to the degree of Texas. This emphasizes the importance of clear communication of reporting guidelines and definitions by the CDC and accountability to the states and other municipalities to ensure the right data are being recorded. The inconsistency with which the data are reported is obscuring the pandemic and compromising our ability to understand and respond to what is going on.



1,277 views2 comments
Post: Blog2_Post
bottom of page