Can't we even trust top journals? Evaluating and processing information in "Wartime Science"

Updated: Jun 12

This covid pandemic has brought the whole world to its knees, and stimulated an unprecedented level of focus and urgency to quickly discover the properties of this novel virus to inform policies to prevent its spread, to diagnose it and triage which infections need higher levels of medical observation and care, to identify effective treatments at different stage that slow progression or treat complications arising from the disease without producing other problematic side effects, and ultimately leading to vaccines and possible novel treatments effective against the disease. I am excited about this intense focus and coordination evident by a skilled and focused biomedical research community that I believe can lead to scientific knowledge discoveries and translational breakthroughs like novel testing devices, vaccines, and treatments in record time. It is truly an exciting time to be in biomedical scientific research!

Given the urgency, the usual pace of scientific discovery and translational application must necessarily increase -- since if we proceed with the usual timelines and cautions by the time we have figured out what to do the pandemic will have run its course and the information will not be useful in treating the millions affected in this first wave.

Thus, studies are designed and commenced much faster than usual, observational results are presented as fast as they can be pulled together, preliminary trial results are immediately presented and reported, papers are placed on preprint servers like medRxiv and bioRxiv and results reported in the media and acted upon before peer review has been done, and peer review is often accelerated to ensure that vetted results are made available soon enough to be translated.

While providing real-time evidence to inform and adapt our mitigation, prevention, and treatment strategies, this rush also introduces substantial risk of bad science and false positive results that can harm, rather than help, our efforts. Clinical studies whose design and rollout are rushed risk overlooking important scientific and clinical factors that could lead to studies that cannot effectively answer their intended study questions or even expose patients to unacceptable risk. Observational study results may be presented in ways that do not sufficiently adjust or account for biases inherent to observational studies from nonrandom treatment assignment and confounding factors and provide inaccurate information about treatment efficacy and safety. Selective interim results from trials are reported, potentially driven by political or commercial priorities, and can produce misleading or incomplete information. Invalid results in preprints based on flawed scientific method or improper data interpretation may lead to unwarranted media attention and application before having the opportunity to be caught and corrected by peer review. Rushed peer review procedures may not catch flagrant issues and weaknesses in studies and lead to the dissemination of unreliable information even in top peer reviewed journals. This is why we need to be especially vigilant in critically evaluating results that come in, even those from top journals, realizing the usual quality control procedures used by scientific researchers and regulatory agencies are naturally not going to work as well under this rushed "wartime science" context.

While there are other examples of this problem, in this post I want to highlight one paper published on 5/22 in a top journal, the Lancet reporting on a huge multinational observational study on Hydroxychloroqunie (HCQ) that appears to affirm many of the popular thoughts on CQ and HCQ that these are not efficacious treatments and raise concerns about cardiac toxicity. As a result of these results, the WHO removed HCQ from its multinational study, and multiple countries stopped studying HCQ in other trials.

It presents information from 96,032 COVID patients from 671 hospitals from all continents in the world (except Antarctica -- sorry penguins) on hospitalizations from 12/20/19-4/14/20. This would indeed be a remarkable and useful database. These data were purportedly accrued by aggregating EHR data from customers of Quartzclinical, the machine learning platform by a company Surgisphere whose CEO Sapan Desai is second author on the paper.

This paper brought immediate scrutiny in online blogs (see here and here) and queries placed on pubpeer by Andrew Gelman, statistician at Columbia University. A few of the key problems raised in these commentaries include:

  • Doubts about the possibility of pulling together this large a data set across many different EMR systems so quickly, given technical issues and privacy concerns. The paper provided no documentation whatsoever of any of this, hiding behind claims of proprietary information in the company Surgisphere that owns the data.

  • Apparently almost no missing data across all of these fields of health conditions, treatments, race, etc., which is an ubiquitous problems in EMR data. (BTW it is against the law to record race in some European countries, but this company was apparently able to get this information in their database).

  • The inherent difficulties such a study would entail, and wondering how it could be done by a team involving just 4 co-authors, Surgisphere CEO Desai (who just resigned from another job to focus on Surgisphere in Feb2020), a cardiologist from Switzerland (Ruschitzka), an adjunct professor of biomedical engineering at University of Utah (Patel), and the medical director of Brigham and Women's Hospital Heart and Vascular Center (Mehra). One would expect this type of study to have many dozens of authors to manage all of the complex issues and analysis, and clearly a statistician and other analytical personnel would need to be involved in performing an analysis of this scope.

  • Issues with the analysis not adjusting for potential confounding which is a key issue in observational studies.

  • Serious problems with the reported characteristics of patients split out by continent.

  • With almost all factors including diabetes, smoking status, BMI, age etc. nearly identical across continents

  • Including more deaths for Australia (73) from 5 hospitals than all reported deaths in the country as of 4/14 (67)

  • Including data from 4402 hospitalized patients in Africa even though only 15,738 cases were reported for the entire continent as of 4/14. It is unlikely >1/4 of all cases resulted in hospitalizations, and even less likely that they would have captured all of these hospitalizations in their EMR-based data base given the relatively low proportion of hospitals in the continent using advanced electronic medical records.

An open letter was written to the authors and the Lancet that raised many of these concerns, and some other general ones.

Subsequently, Lancet published an updated version of the paper and updated supplementary materials on 5/29 (original article published 5/22), with the Surgisphere website providing and explanation of changes. In short, their explanation is that the original table appeared more homogeneous because they mistakenly included one that included "adjusted numbers" that were "propensity score matched and weighted", and they also suggested 1/5 of the Australian Hospitals was mistakenly attributed to Australia when it should have been attributed to Asia. Here are the original and revised tables:

Here is the revised table:

Hmmm. Looking at the Australia data, they reallocated 546 cases from Australia to Asia as they said, and reduced the number of mortalities in Australia from 73 to 3, far below the reported deaths 67. However, they only increased the number of mortalities in Asia by 28. Where did the other 42 deaths go?

Also, if the original table was propensity matched data, it should have had smaller sample sizes than the unadjusted raw data, yet the sample sizes are the same. If reweighted/adjusted in some way, then we would expect some substantial differences in the table -- there are differences but they are very minor differences in most cases, and seem to just be jittered a bit. This does not inspire confidence that these are authentic data. Even if the second table is the authentic raw data table, there still seems to be far more homogeneity across continents than one would expect -- e.g. in current smokers, former smokers, use of ACE inhibitors, etc.

The Surgisphere website tries to explain the nature of their data -- that it came from deidentified electronic health records from customers of QuartzClinical, Surgisphere's machine learning analytical platform, and apparently all of these hospitals purchased this platform and provided their EHR data as part of the customer agreement in an anonymized way. This product QuartClinical is discussed online as a new tool introduced in 2019 -- it would be truly remarkable for this product to be so universally utilized that this level of international data could be accrued. Note that even the North American data containing >63,000 hospitalized patients, is >10% of the total number of COVID-19 cases in North America through 4/14 (613,271, from ourworldindata), and this may account for a high proportion of all North American Covid-related hospitalizations through that date. Maybe they have this incredible data source that is the most amazing EHR treasure trove in the world --that no one else seems to have known about -- I hope they do as this could provide a lot of great knowledge to the world -- but a reasonable person might be skeptical about whether this is indeed the case and if so, how no one else would have known about it before now.

There is an article in the Scientist that talks in more detail about this controversy, and goes into some of the background of the company and CEO that raise certain other questions. Certainly, a high impact study like this would springboard an aspiring data science company -- and indeed their website that prominently highlights the paper seems to lack substantive details about the history of the company, anything about its purportedly expansive set of international customers, and may read to some as a quickly assembled website with a lot of fluff and little substance.

Update: BTW, here is a blog post by Peter Ellis who outlines why he believes these data are fabricated and he does not believe it is plausible for them to have all of this data from all of these hospitals. Gets at some of the doubts I conveyed above but with much more detail and explanation.

In short, the veracity of these data and results are at a minimum questionable, and one might reasonably wonder whether these data are authentic or whether they might even be largely (or completely) fabricated. Given the lack of documentation by the authors, there is no way for peer reviewers to know whether these data are authentic or not, and in the rush to publish important studies on COVID-19 may have led to one of the eminent medical journals in the world to publish a potentially fraudulent study. I can't imagine a study with as many questions as this one making it through peer review under normal circumstances. This highlights the danger of "wartime science"

It turns out that on the basis of this "study", the WHO has dropped HCQ from its international study, the UK and France have dropped the drug. Yikes. And with Remdesivir receiving "standard of care"status based on scant data.

UPDATE (6/3): There is growing concern about this paper, as well as another published in NEJM by this same company about cardiac medications and Covid-19. Here is an article in the Scientist and another in Science detailing recent events, including an "Expression of Concern" on the Lancet and NEJM websites, with a ticking clock for authors to provide documentation of data veracity (or retractions could be coming). BTW this article mentions that 0 hospitals have confirmed participation in the study so far, not even the major Boston hospital of the first author (!!!!). The company's LinkedIn page apparently has <100 followers and just 6 employees (no, wait, now just 3 as of today), and the "get in touch" link on the Surgisphere page directed to a template for a cryptocurrency website (Desai's last scheme that he didn't follow through on?), and Desai's Wikipedia page has been deleted. Yep, warm up some space on your website, Retraction Watch. A few more coming your way!

Retraction watch has listed all COVID-19 papers that have been retracted, and so far there are 13, including one in the major journal Annals of Internal Medicine that strongly stated that surgical and cotton masks don't block SARS-CoV-2 in a study based on 4 patients (!) This is especially disturbing as we have learned that wearing even primitive masks can reduce viral load cast from infected individuals and thus is a PRIMARY tool for viral mitigation, and one that does not destroy the economy and people's lives.

ALSO, the Chinese preprint, whose data were used by the Imperial college report that predicted 500k deaths in Britain and 2.2M deaths in USA that guided worldwide lockdown policies, was posted on March 16, and subsequently taken down on March 30 promising a reposting once the article was corrected, but has never been corrected.

Update 6/4/2020: Both Surgisphere papers, the Lancet paper discussed above and a New England Journal of Medicine one on ACE2 inhibitors, have now been retracted!

UPDATE 6/12: This article describes result of an audit into Surgisphere and its CEO, senior author on this paper. Needless to say, lots of problems there. Highlights the need for rigorous validation and documentation of data sources and a reminder that there are some bad actors out there.

We have to critically assess all information we receive, not just from unpublished manuscripts, but even published manuscripts, even (and maybe especially) those in the top high profile journals. Dang this is hard -- when you can't even assume peer reviewed articles in the top journals in the world are reliable, how can we begin to discern truth?

We need to evaluate all evidence very carefully, running through the ringer of statistical data science and other checks, and we need to aggregate this information together and communicate it effectively to the world. It is not easy, and there are no shortcuts, but we can sift through the accruing information and synthesize it, and key truths will emerge (and have emerged). In spite of bits of misinformation and poorly interpreted results, we have learned a lot about this virus and are moving forward at a pace not seen in any other disease before. Don't be too discouraged -- there is a lot of good science out there, but we live in perilous times and we need to keep our critical thinking cap handy as we evaluate this fast-accruing information.

©2020 by Covid Data Science. Proudly created with