Overview

COVID-3D is an interactive structural protein resource aimed at users wishing to analyse protein structure and missense mutational effects for effective drug and vaccine development against the pandemic virus. Our tool explores different protein and mutational properties, which are presented to the user in a detailed manner below. An initial List View enables the user to choose or search for a specific protein of interest. The individual protein pages are separated into six main sections: 3D Structure, Sequence, Mutations, Normal Mode Analysis, Molecular Dynamics and Downloads, easily accessible through the right hand side panel of the webpage. To permit clear visualisation of the analyses carried out, user-driven options have been embedded as different checkboxes and drop-down menus.


Supported browsers: Chrome, Firefox, Opera and Brave.

List View

The home page of the website presents a link to a protein list view (under the “Browse data” button, or “Browse” menu tab), where proteins are presented as different ‘cards’ summarising protein name, GenBank code, structure figure and MTR and RVIS scores, showing protein tolerance to accumulation of missense mutations.


Proteins can be found through their common name (example Spike) or through their Genbank Identifier (e.g. QHD43416) using the Search function (1). Clicking on Details allows the user to navigate to the protein’s page.


Gene Essentiality Scores

Each Sars-CoV-2 Protein in the List view has two gene-essentiality scores. MTR and RVIS (2).

  • Missense Tolerance Ratio (MTR): A protein’s tolerance to missense variants is calculated via MTR. An MTR Score is strictly positive. A score between 0-1 implies that a protein is intolerant to missense variants, while a score greater 1 implies that there are more mutations than would be present under no selective pressure, hence it is tolerant to missense mutation.
  • Residual Variation Intolerance Score (RVIS): RVIS is an alternative measure of gene essentiality by comparing the number of common functional variants (missense variants that have a Minor Allele Frequency (MAF) greater than 0.01% to the number of total variants. An RVIS Score with a negative value implies intolerance while a positive value implies tolerance to functional variation.

Interactive 3D Protein Structure Viewer

Using the Interactive 3D Protein Structure Viewer

The Interactive 3D Viewer allows the user to view their protein of interest with various different options allowing them to assess the impact of missense variants on the structure. In particular, the screenshot option (3) enables the user to create informative images according to their preferred property, e.g. conservation. Different features are outlined below:

  • Background: Users can choose between a white and black background (4).
  • Representation: Users can select between visual representations that are available, similar to PyMol viewing options (5).
  • Color Scheme: The protein structure and its components can be coloured according to various features generated such hydrophobicity, electrostatic potential and epitopes present; once selected a colored surface view is generated, users are able to modify the opacity of this surface using a slider (6).

Proteins that have multiple structures, have a ‘Structure’ drop down menu (7) that enables the user to select between the corresponding structures. Users can interactive with the viewer (8) using mouse controls or interacting directly on touch-enabled devices.

  • Rotation: Left-click and drag
  • Zoming in and out: Scrolling with the middle mouse button.
  • Drag the structure: Right-click and drag.
  • Re-center: Middle mouse button click will re-center the viewer on a specific atom/residue.




Coloring the Structure Schematically

Within the interactive 3D sequence viewer, we have embedded different structural analyses onto the protein for easy visualisation of property patterns across the whole protein.


  • Chain: Different chains within a complex are visualised in distinct colors
  • Residue: Each residue is colored according to residue type
  • Secondary Structure: Alpha helices within the structure are colored in pink, beta sheets in yellow and loops/disordered regions in white
  • Hydrophobicity: Residues colored according to hydrophobicity
  • Electrostatic potential: ranging from negative (red) to positive (blue) on a scale for coloring the surface representation. Opacity of surface representation is also user adjustable
  • Conservation: ranging from not conserved (red) to conserved (blue) on a scale for coloring the surface representation. Conservation calculated by ConSurf as rates of evolution
  • Linear Epitopes: regions considered as linear epitopes are colored in light orange
  • Discontinuous Epitopes: regions considered as discontinuous epitopes are colored in light orange
  • MTR score: ranging from intolerant (red) to tolerant (blue)
  • COVID-19 variant frequency: colored according to frequency within our dataset from 1 (pale yellow) to > 200 (red)

Finding binding pockets

The protein interactive viewer also permits visualization of possible druggable regions within the protein, as identified by Ghecom. Selecting the checkbox named “Pocket” (9), highlights these regions on the structure one by one, where the user can select their preferred pocket according to volume, under the “Pockets” drop-down menu (10). For example, the figure below shows the druggable pocket of the apo main protease having a volume of 720 Å3. When loading the holo forms of the protein, the user can see that this pocket is occupied by the ligand, which validates the druggability potential of the site. This type of visualization is particularly important in drug discovery efforts, especially in designing resistance-resistant drugs - as the user can combine druggable region information with positional and functional information of the circulating COVID-19 variants - by clicking on the “COVID-19 variants” checkbox (11), as well as variants from known similar viruses such as SARS.


Running Virtual Screening

To complement drug and fragment binding site analyses, we have embedded a one-click virtual screening function through EasyVS. The user can process the protein of interest by clicking the ‘EasyVS’ button (12) within the 3D Structure panel, and will be directed to the EasyVS website to run virtual screening on different structure libraries and according to user preferences (e.g. Lipinski’s Rule of 5).


Visualizing fragment hotspots

To further enable drug discovery efforts, we have included a fragment hotspot map for each structure, which highlights the areas that bind small fragments with high affinity. While these pockets align with the binding pockets calculated by Ghecom. These provide information on the nature of the preferred binders: hydrogen bond donors (blue), hydrogen bond acceptors (red) and apolar (yellow) fragment moieties. The user can visualize the different types of hotspots by ticking their respective checkboxes (next to “Fragment Hotspots”) (13), while the size of the hotspots can be adjusted according to level: from hotspots which likely bind and interact with fragments (level 17), to regions which exhibit reasonable interactions with fragments (level 14), to those regions which are large enough to ‘grow’ fragments into drugs (level 10), but unlikely bind small fragments (14). For drug discovery efforts, this information is important since it allows the user to identify highly “reactive” allosteric sites within the protein, which can be used for novel lead discovery.


Selecting COVID-19 Variants and Properties

The user can get a general overview of variant distribution across the structure by ticking the “COVID-19 variants” checkbox (15). Apart from residue position, the user has the option of viewing variants colored by frequency (i.e., the count of the number of time that variant has been observed across all sequenced SARS-CoV2 - by ticking the “Variant Frequency” checkbox) (16). The figure below shows variants spread across the main protease, colored according to frequency within our analysis and those having a coupling (17) score of 0.259. While knowing variant and frequency distribution across the protein is important, evolutionary coupling measures provide the user with added information on the specific variants which tend to develop together. This insight, coupled with the mutational information on protein structural and functional effects, can further inform the user on the possible role of these mutations for viral survival and replication.


Variation in Homologs: Bat RaTG13 and SARS

Apart from viewing the circulating SARS-Cov-2 variants, we have added variant tables from known homologous organisms: Bat RaTG13 and SARS, which the user can analyse in a similar way. These variants become particularly useful in therapeutic development, as, since the homologous organisms have been circulating for longer periods of time, their resulting variation is likely to be advantageous for survival. These variants, therefore give a snapshot of possible SARS-Cov-2 mutations which might affect transmission and also response to therapeutics.

Human ACE2/B0AT1 Variants (Spike Protein)

As one of our Spike proteins is bound to human ACE2 and B0AT1, we have added an extra section in the visualization tool to permit users to visualize the structural distribution of human variants across eight different populations: Latino and Hispanic, East Asian, British, African and African American, Finnish, South Asian, Ashkenazi Jewsih and Non-Finnish Europeans (18). Variant information for the human genes is also available as a separate mutation table. This feature enables the user to analyse possible differences in human populations with respect to protein binding to Spike, as well as complementary effects of mutations between spike and ACE2 across different populations.


Protein Sequence View

The Protein Sequence View provides a linear overview of the features within a SARS-CoV2 protein including :

  • Sequence: Wild-type primary protein sequence.
  • 3D structure: Portion of the 3D structure that has been modelled (or crystallized).
  • Variants: Observed variants circulating in SARS-Cov2 genomic sequence data.
  • Linear Epitopes: A linear epitope is a linear sequence of amino acids, or primary structure that is recognized by an antibody. The user can view the linear epitope prediction that was produced using DiscoTope 2.0 on the Sequence Viewer.
  • Discontinuous epitopes: A discontinuous epitope, similar to a linear epitope is recognized by an antibody; but are amino acids that are brought together after protein folding and aren’t necessarily sequential. ElliPro was used to produce the discontinuous epitope prediction.
  • Lollipop plot: A lollipop plot is also present showing the position and frequency of variants across the gene.

Using the Protein Sequence View

For any feature, the user can zoom into a specific region by clicking and dragging their tooltip within their region. Right-click the sequence viewer in order to get back to view the full sequence.



Viewing sequence features in the structural view.

Clicking on any feature (such as a variant, linear/discontinuous epitope), will present the selected regions on the protein structure view.


Mutation Table and Mutation Analysis

The mutation table can be accessed via the “Mutations” panel shortcut on the right hand side of the webpage, as seen in the figure below (19). It encompasses SARS-Cov2, except for variants unique to GISAID, (20) as well as SARS (21) and Bat RaTG13 (22). The user can choose what type of information to view from:

  • Residue Properties: these represent the local environmental properties of the residues subjected to variation (23). Information in this section describes the frequency of mutations within our tested sequences, residue depth within the structure, Phi and Psi angles of the wildtype residue, and relative surface accessibility. On their own, they provide a general snapshot of the variation sites, however, coupled with other properties like stability scores, can help the user draw up conclusions on which mutations can for example, lead to resistance, and therefore which should be avoided in drug design efforts.
  • Stability/affinity scores: the changes in stability, dynamics and affinities to ligands, nucleic acids and interacting proteins (24). These are generally calculated as ΔΔG in kcal/mol, save for ligand affinity changes which are given as log(affinity fold change), and ENCoM vibrational entropy changes (ΔΔS), given as kcal/mol/K. Values obtained can either be positive (increase in property e.g. stability) or negative (decrease in property). Notably, because entropy indirectly correlates with enthalpy in the Gibbs Free energy equation: ΔG = ΔH - TΔS, entropy values have an opposite direction to enthalpy values (positive entropy is associated with decrease in stability and vice versa Mutations are considered to have milder effects when their values are close to 0 (-0.5 to 0.5), with stronger, possibly detrimental effects at the extremities (< -1.00 and >1.00). Values for different affinities: to ligand, other proteins and nucleic acids, where applicable, have been filtered according to distance, where effects of mutations beyond 10 Å are considered negligible. Information of distance to these interacting regions is also available on the website. These values are particularly important to consider as they measure the effects of introducing a mutation on structural properties and functions, providing insight into the possible consequences of mutations. Coupled with other information, such as mutational frequency, and different evolutionary metrics, conclusions can be drawn up from these values to predict overall effects.
  • PSSM scores: different Position-Specific Substitution Matrix scores are shown on the website, and, because they are derived from multiple sequence alignments of different, but functionally similar proteins, they are a representation of functional properties as a result of evolution (25).
  • AAindex: these are numerical representations of amino acid physicochemical and biochemical properties (26).


Consequences of a given mutation

While the values presented in the mutation table are valuable on their own, combining different sources of information presents a more holistic approach for understanding the underlying molecular mechanisms. Notably, it is important to remember that both positive and negative measurements can have an impact on the overall protein structure and function, for example, a local increase in stability may rigidify a loop, preventing conformational changes required for ligand binding. Moreover, the same mutation, depending on its position, may affect multiple different properties to different extents. Because of this, it is recommended that while analysing the values, local function-specific changes such as affinity changes are considered separate from global effects like protein stability and dynamics. Mutations having overall mild effects should also be considered, especially relating to their frequency, as mild structural and functional changes may permit retention within a viral population.



Mutation details

Each mutational entry within the Mutation Table has a ‘Details’ link (27) which opens up a page to show the local interactions observed by the wildtype, and mutant (based on the user’s choice in the “Structure” drop-down menu).


Consequences of the given mutation are shown (28). The user can visualize these interactions through the 3D interactive viewer (29), where information on evolutionary couplings, based on score, can be added, and different types of interactions can be shown or hidden from the viewer. The 3D viewer permits visualization of one structure at a time, and, by default, all interactions are shown, while evolutionary couplings are hidden from view. To enable the user to easily compare interactions between wildtype and mutant, we have added a 2D interactive viewer, visible irrespective of what is shown in the 3D viewer (30). Here, mutant interactions are visualized as solid lines, while wildtype interactions are depicted as dashed lines, and the user can choose the interactions, and the structural form to view. A list oc co-occuring mutations is also provided (31).


Mutation analysis

We have created a built-in tool to explore correlations (32) and distributions (33) of calculated properties of circulating variants within COVID3D, by clicking on "Analysis" in the mutation table, which allows for hyphothesis generation.



Normal Mode Analysis

Normal Mode analysis represents the overall protein dynamics in an approximated fashion, by focusing on one specific conformation and applying harmonic motion to it. Results of normal mode analysis on each protein are available for the viewer (34), and are presented as:

  1. Porcupine plots (35), which show the dynamic vector of regions within the structure under dynamic changes.
  2. Modes visualisation (36), showing the trajectory of the protein across different modes in a dynamic and static representation.
  3. Deformation energy (37), highlighting protein regions according to residue deformation energies (i.e. the local protein flexibility), where blue is low, white is moderate and red is high, and with varying thickness, also depending on magnitude.
  4. Atomic fluctuation (38), representing the amplitude of the absolute atomic motion across the protein, rendered consistently with deformation energy.


Molecular Dynamics

At the bottom of each protein page, we have added a molecular dynamics video (39) for the user to see how the protein moves within a time lapse. This permits the viewer to identify specific regions within the protein which are more dynamic, and keep that information in mind when analysing stability effects of mutations in those regions. Proteins available in different forms, e.g. holo and apo, have been analysed and uploaded separately.



Downloading COVID3D Data

Finally, the user has access to download the protein structures used for calculations, and the mutations being analysed for each protein across the COVID-19 genome. Each protein and respective mutations can be downloaded from the individual protein pages (40).


The source code used in our analysis is also available under the “Code” Menu (41).



User variants

In addition to the browse option, users also have the option of uploading a list of variants on the "User variants" (42) menu option.

On this page, one is required to select one of the protein structures available (43) and submit a plain text file (.txt) with one mutation per line. A sample file with mutations for ORF3a (QHD43417) is also available for download (44).



On the results page, the Interactive 3D Viewer allows the user to view their protein of interest with various different options allowing them to assess the impact of missense variants on the structure. In particular, the screenshot option (51) enables the user to create informative images according to their preferred property, e.g. conservation. Different features are outlined below:

  • Background: Users can choose between a white and black background (46).
  • Representation: Users can select between visual representations that are available, similar to PyMol viewing options (47).
  • Color Scheme: The protein structure and its components can be coloured according to various features generated such hydrophobicity, electrostatic potential and epitopes present; once selected a colored surface view is generated, users are able to modify the opacity of this surface using a slider (48).
Users can interactive with the viewer (50) using mouse controls or interacting directly on touch-enabled devices.
  • Rotation: Left-click and drag
  • Zoming in and out: Scrolling with the middle mouse button.
  • Drag the structure: Right-click and drag.
  • Re-center: Middle mouse button click will re-center the viewer on a specific atom/residue.

Ticking the "Users variants" option (49) shows mutated residues based on user's input as sticks on the selected protein structure.



Input mutations are also shown on the sequence viewer (52) and the lollipop plot (53) at the bottom of the "Sequence" section.



The Mutations table (54) summarises all physicochemical properties and mutation effects. Clicking on "Details" button (55) to analyse individual mutations.