In late December 2019, computers at the National Institutes of Health (NIH) in Bethesda, Md., received eight pages of genetic code, unwittingly becoming recipients of critical information about the virus that would soon unleash a pandemic. This genetic map, submitted by Chinese scientists to a U.S. government-run public repository of sequencing data, detailed a mysterious new virus that had infected a 65-year-old man in Wuhan.
At the time of the submission, Chinese officials had not yet disclosed information about the unexplained pneumonia affecting individuals in Wuhan. However, the U.S. repository, designed for routine data sharing among scientists, did not immediately add the submission to its database. Instead, three days later, it requested the Chinese scientists to resubmit the genetic sequence with additional technical details, which went unanswered.
Nearly two weeks later, a different set of virologists, one Australian and one Chinese, took the initiative to publish the genetic code of the new coronavirus online. This set off a global effort to combat the virus by developing tests and vaccines.
The revelation of the initial attempt by Chinese scientists to share the crucial genetic code came from documents released by House Republicans investigating the origins of Covid-19. These documents have raised questions about when China became aware of the virus and have underscored gaps in the U.S. system for monitoring potential threats from new pathogens.
Chinese authorities claim they promptly shared the virus’s genetic code with global health officials, but the released documents suggest otherwise, casting doubt on the transparency of China’s response. While the news of the virus being sequenced in late December 2019 has been reported in various sources, the documents provide new details about when and how scientists attempted to share this information globally.
However, the U.S. system for identifying potential threats faces challenges in distinguishing critical information from routine genetic sequences submitted daily to its repository. The House Republicans’ documents reveal that the NIH repository, GenBank, did not publish the received genetic code because it was unable to verify it. Despite follow-ups by the NIH, the Chinese scientists did not respond to requests for additional information and corrections.
According to the Department of Health and Human Services, the genetic sequence underwent a technical review, and, after no response from the Chinese scientists, the database automatically deleted the submission from its queue on January 16, 2020. The reasons for the lack of response from the Chinese scientists remain unclear.
The same genetic sequence submitted to GenBank was made public on a different database, GISAID, on January 12, 2020, shortly after other scientists had posted the first coronavirus code. The Chinese scientists later resubmitted a corrected version to GenBank in early February and published a paper detailing their work.
The two-week gap between the initial submission to the American database and China sharing the sequence globally has fueled skepticism about the transparency of China’s response. While the documents do not provide insights into the virus’s origins, they shed light on the challenges faced by the U.S. in monitoring and responding to potential threats in the vast sea of genetic data.
Despite the difficulties in rapidly identifying critical information, some scientists argue that databases like GenBank should be retrofitted to allow for more efficient identification of sequences with public health implications. These modifications could include automated scans for new pathogens with genetic codes overlapping known dangerous pathogens, facilitating wider circulation of critical information even as details are awaited.
The revelations from the House Republicans’ documents have reignited debates over global cooperation and information sharing during pandemics. The urgent need for timely and transparent data exchange remains crucial in addressing emerging health threats, as highlighted by the events surrounding the early days of the Covid-19 pandemic.
The documents released by House Republicans underscore the complexities and challenges faced by global health systems in responding to emerging threats, particularly in the context of data sharing and collaboration. While questions persist about the origins of the virus, the focus should also extend to improving the efficiency and effectiveness of global data repositories and information-sharing mechanisms to enhance preparedness for future health crises.