The response of the structural biology community to the COVID-19 pandemic

Until now, more than 27 million cases of COVID-19 have been registered worldwide resulting in more than 850 000 deaths. No effective treatment is presently available and there is an intense race to develop drugs as well as vaccines against the COVID-19 pandemic. More than 50 companies and 20 institutions are currently participating in the efforts with some of the around 120 potential vaccines already advanced in phase III of clinical trials.

The development of efficient drugs and vaccines requires a deep understanding of the biological targets and knowledge of the three-dimensional shape of the targeted proteins. Structural biology, within the first days of the pandemic, responded quickly to the new challenges imposed by the coronavirus threat to human health worldwide and provided key information immediately available to the scientific community to help in the fight against COVID-19.

The first outbreak of COVID-19 was recorded in the city of Wuhan, China, during December 2019. The disease spread rapidly in more than 212 countries and it was declared a global health emergency by the World Health Organization (WHO) on March 11th, 2020. The scientific community was quick to respond to the new pandemic and to try to understand the high infectivity and unique properties of the new virus, which was named SARS-CoV-2. The genome of the SARS-CoV-2 virus was made publicly and openly available on January 10th, 2020 and it was a critical step in the efforts to study the new coronavirus and identify its differences from the previously known coronaviruses.

Based on the genome information, it was found that the viral genome encodes for a number of structural and non-structural proteins as in other coronaviruses with some differences in their amino acid sequence. Amongst the identified proteins, there are ones critical for the viral cycle, replication, and attachment to host cells although all proteins in one way or another play key roles in the life cycle of the virus. The genetic information made possible the expression and purification of the SARS-CoV-2 viral proteins for use in structural biology studies to reveal their 3-dimensional shape. Knowledge of the 3-dimensional shape of proteins is critical in the development of new drugs and helps in identifying binding pockets for small molecule inhibitors and important interactions responsible to elicit biological responses. Disruption of these interactions could also serve as a starting point for the development of new therapies.

X-ray crystallography and cryo-electron microscopy (cryo-EM) are the two structural biology techniques that can provide the highest details of protein structures. With the help of X-ray crystallography, the crystal structure of a key protease from SARS-CoV-2 in the presence of a powerful inhibitor was determined within less than a month after the genome sequence of the new coronavirus became available. This was quite an achievement considering that a crystal structure determination may take months or years. It is a multi-step process that involves expression of the protein, crystallization, data collection, calculation of initial phases, optimization of the structure (refinement), and analysis of the structure. The structure, as soon as it was finalized, was deposited to the Protein Data Bank (PDB) and became immediately available to the scientific community on the 5th of February 2020 upon the request of the depositors. Various other structures of the main protease of the virus soon became available and in a similar way, they were deposited for immediate use by the scientific community.

During spring, as national lockdowns and travel restrictions were imposed on most countries, access to the synchrotron facilities for crystallographic data collection was not an option. Several synchrotrons, however, were kept open to operate exclusively on projects related to COVID-19 research. Other synchrotrons were offering some limited remote access for regular projects, and we were able to get such access to MAX IV (Lund, Sweden), so we could collect some data for our projects. In DIAMOND Light Source (Oxford. U.K.), thousands of crystals of the main protease of SARS-CoV-2 were soaked in various library fragments resulting in more than 40 structures by April. These structures and many more that came afterwards (at the time of the writing there are 261 structures of the main protease and the binding of 185 ligands has been elucidated) are currently studied for the development of new compounds as powerful inhibitors of the SARS-CoV-2 protease in a kind of a crowdsourcing project with the participation of many scientists around the world from different disciplines.

The use of cryo-EM was also instrumental in understanding the inner workings of the new coronavirus. The entry of the virus to human cells is achieved through the binding of its spike protein to the human angiotensin-converting enzyme 2 (ACE2), an enzyme than is found in the lungs, arteries, intestines, heart, and kidneys. ACE2’s role in the human body is to lower the blood pressure but some coronaviruses are able to highjack it for their own purposes and use it as an entry point to human cells.

Cryo-EM has become in recent years a powerful technique to study proteins at atomic detail. Owing to the development of extremely sensitive direct electron detectors, the structural details we can see with cryo-EM have been dramatically improved from ~15 Å in 2010 to 3-4 Å to-date in routine measurements. The highest resolution reported so far for a cryo-EM structure is 1.2 Å, which is equivalent to atomic resolution in X-ray crystallography. Although there are still various hurdles to overcome, such as the high cost and maintenance of the microscope and preparation of the samples, the cryo-EM technique is expected to play a key role in the study of protein structures for years to come.

During the pandemic, one of the first structures than came out using cryo-EM was the structure of the viral RNA-dependent RNA polymerase (RdRp), a key enzyme for viral replication. Comparison of its apo (free) form and in complex with the antiviral drug remdesivir revealed how remdesivir can confuse RdRp and halt the replication process. Based also on cryo-EM studies, the role of the spike protein in the attachment to the ACE2 receptor was unraveled in great detail. Within a month or so of the availability of the SARS-CoV-2 genome sequence, structural insights into the spike protein started to emerge.

The spike protein is one of the structural proteins of the coronaviruses and a target for the development of therapeutic antibodies, vaccines, and diagnostics. It forms trimers as in other coronaviruses, such as SARS-CoV and MERS-CoV, and is found in two conformations, known as ‘up’ and ‘down’ depending on the position of the receptor-binding domain (RBD) in one of the monomers of the trimer (Fig. 1). In the ‘up’ conformation the RBD is accessible and ready to bind to the ACE2 molecule. In contrast, the ‘down’ conformation makes the RBD inaccessible and unable to bind to the receptor.

Fig. 1. The change of SARS-CoV-2 spike protein between its two conformations (‘up’ and ‘down’). Each chain of the trimer is colored differently. The structures were determined with cryo-EM (PDB accession codes 6vxx and 6vyb). The movie was prepared with CHIMERA.

The binding of the RBD to the ACE2 receptor has helped in evaluating the potential of various antibodies found in convalescent plasma. Structures of various antibodies purified from the plasma of recovered COVID-19 patients have been determined in the presence of the RBD and different binding modes/epitopes have been identified. There are, for example, antibodies that bind close to the receptor-binding motif (RBM) of the RBD and prevent the binding of ACE2. Other antibodies bind away from the RBM and possibly affect the conformation of the spike protein, by potentially altering its flexibility. Moreover, a cryo-EM structure of the entire spike protein in its trimer form with an antibody showed that the antibody was able to recognize certain sugars (glycans) that decorate the surface of the spike protein and contribute to its stability.

Up to date, there are more than 800 depositions in the PDB for SARS-CoV-2 proteins. For almost half of the 26 proteins found in SARS-CoV-2, their structures have been determined. The amount of structural information has led to the creation of a special web portal at the PDB dedicated to the structures of the SARS-CoV-2 proteins and their complexes (https://www.ebi.ac.uk/pdbe/covid-19). This has greatly helped the dissemination and utilization of the results, especially in structure-assisted drug design and vaccine development.

The response of the structural biology community to the pandemic has been swift and decisive. Most importantly, the community adopted right from the beginning of the pandemic an open access policy. All structural information was shared quickly and freely and most papers were deposited as pre-prints in on-line depositories for immediate use. Usually, structures are released when papers are published after peer-reviewing, but in the case of SARS-CoV-2, all structures became immediately available, before the papers were published or even before they were written.

Structural biology was able to successfully respond to the new crisis mainly for the following reasons:

1) Continuous adoption by the structural biology community of new techniques and practices, leading to improvements over the years in methods, tools, and technologies. Powerful X-rays, ultra-sensitive detectors, robots, increased computing power, and fast algorithms, have contributed to reducing the time needed for structural studies. As many structural biologists feel nowadays, it may take less time to solve a structure than writing the paper reporting the structure.
2) Open access practices already in place as exemplified by the use of the PDB for many years now and other databases for sequence and genomic data.
3) Strict validation criteria in place for all structures deposited to the PDB to avoid mistakes or to quickly identify mistakes and correct them. In some cases where mistakes were found in SARS-CoV-2 protein structures, these mistakes were easily corrected in a scholarly manner. In many cases the raw data (e.g. diffraction images) were also deposited, so many more ‘eyes’ were able to check, re-process, and re-analyze the original data.

The role that structural biology played and continues to play in the time of the pandemic is a good example of what fast and open access to results and data can achieve. Lessons from the response of the structural biology community to the pandemic could be used for assessment of current data availability practices and help in ensuring that in times of a public health crisis the scientific information spreads efficiently, ideally faster than the spread of the crisis.

Anastassios Papageorgiou

The writer is Head of Protein Structure and Chemistry core at Turku Bioscience Centre, Adjunct Professor at Faculty of Science, PhD