3D binding pocket protein structure similarity virtual screening

Introduction

Algorithm

Examples

Run 3Dclick

References

Contacts

Help

Contents

1. Query structure and ligand

Query Structure

  • Users can choose query structures from the PDB or upload their own structure files.

  • The input is the 4 letter PDB code and a letter for chain as the following format -
    • 1zaoA

    • The structure 1zao, chain A

    Ligand Name

  • The name of the query ligand should be specified exactly as it is in the PDB files (columns 17-20 in the PDB format) as the following format -
    • ATP

    • The name of the query ligand is ATP

    Ligand Position

  • The position of the query ligand should be specified exactly as it is in the PDB files (columns 23 - 27 in the PDB format) as the following format -
    • 286

    • The position of the query ligand is 286

    • 512C

    • The position of the query ligand is 512C

    Ligand Chain

  • The chain of the query central water/atom should be specified exactly as it is in the PDB file (column 21 in the PDB format) -
    • A

    • The chain of the ligand is A

    Choosing database

    Currently, users can run virtual screening over structural databases of

  • Small SCOP fold representatives of 252 protein chains

  • SCOP families of 2868 PDB chains

  • Proteins of 32,647 PDB chains from non-redundant (NR) 95 database

  • E.coli AlphaFold of 4363 structures

  • S.cerevisiae AlphaFold of 6039 structures

  • Human AlphaFold of 23,391 structures

  • The 3Dclick web server offers the users the flexibility of uploading their own database, should the default ones be inadequate for their queries. The format of users' database is as follows: the first line is the type of their database (protein) and next lines are just the letter PDB code and chain and/or AlphaFold structures at EBI (https://alphafold.ebi.ac.uk/) - eg.
    • protein
      1kj9A
      1zaoA
      AF-Q5VST9-F32
      AF-Q5VST9-F30
      AF-Q9BVS4-F1
      AF-Q9Y6M4-F1
      AF-O14730-F1
    The limit of number of uploaded structures is 10,000. Please connect with us for running larger database of uploaded structures.

    Email address

  • A link to the results page will be sent to the email address provided.
  • Since jobs could take a long time (large databases), it will be useful to provide an e-mail id.
  • Input examples

    [back to top]

    2. Scores of 3Dclick

    Structure Overlap

  • Given two protein structures/substructures A and B, structure overlap (also called equivalent positions) is defined as the percentage of the representative Cβ atoms in the structure A that are within 3.5Å of the corresponding atoms in the superimposed structure B.
  • Root Mean Square Deviation

  • Root mean square deviation (RMSD) is the norm of the distance vector between the two sets of coordinates of representative Cβ atoms, after superimposition. It is given by:
    • Where, N is the match size, and xiA and xiB are the Cartesian coordinates of representative Cβ atoms of structurally equivalent amino acid residues of proteins A and B.

    Z-score

  • For each comparison of a pair of 3D structures/substructures comproduced by 3Dclick, a Z-score is computed to determine the significance of the comparison:
    • Where

    • A is the query structure for which similar structures are sought in a database, whose members are the structures {Si}.
    • SOA-Si is the average structure overlap (computed over the representative Cβ atoms) on the superimposition of A and Si within cut-off distances of 1 Å, 2 Å, and 3 Å.
    • avg_SObg and std_ SObg are the average and standard deviation of structure overlap within 1 Å, 2 Å, and 3 Å of 3D structure comparisons produced by comparing all members of a background database with one another. Here, the background database is a non-redundant set of protein structures consisting of 1601 chains. The avg_SObg is then computed by an all-against-all 3D structure comparisons.
  • For a significant match, Z-score should be above 2.0 (Nguyen et al., J. Chem. Inf. Model., 2019). The greater the Z-score the more significant is the match.
  • Identical Residues

  • #Identical residues are defined as the number of identical residues in the list of structurally equivalent amino acid residues of proteins A and B.
  • Good Contacts

  • #Good Contacts are the number of good contacts between the ligand and the target protein in their complex before using the energy minimization step. Here, the complex is simply the coordinates of the ligand transferred onto the new target protein after superimposing the structure of target protein on the binding site of ligand (Algorithm section). The good contact is defined in the Algorithm section, i.e, a contact beween two non-hydrogen atoms of the ligand and the target protein is defined as a good contact if their Contact score >0.89 and ≤1.30.

    Bad Contacts

  • #Bad Contacts are the number of bad contacts between the ligand and the target protein in their complex before using the energy minimization step. A contact beween two non-hydrogen atoms of the ligand and the target protein is defined as a bad contact if their Contact score >0.75 and ≤0.89.
  • Ugly Contacts

  • #Ugly Contacts are the number of ugly contacts between the ligand and the target protein in their complex before using the energy minimization step. A contact beween two non-hydrogen atoms of the ligand and the target protein is defined as an ugly contact if their Contact score ≤0.75.
  • Residues of Target

  • #Residues of Target are the number of amino acid residues of the target protein.
  • Good Contacts after Minimization

  • #Good Contacts after Minimization are the number of good contacts between the ligand and the target protein in their complex after using the energy minimization step.
  • Bad Contacts after Minimization

  • #Bad Contacts after Minimization are the number of bad contacts between the ligand and the target protein in their complex after using the energy minimization step.
  • Ugly Contacts after Minimization

  • #Ugly Contacts after Minimization are the number of ugly contacts between the ligand and the target protein in their complex after using the energy minimization step.
  • Energy before Minimization

  • Energy before Minimization is calculated using AmberTools22 for the complex of the ligand and the target protein before the energy minimization step.
  • Energy after Minimization

  • Energy after Minimization is calculated for the complex of the ligand and the target protein after using the energy minimization step. The aim of the energy minimization step is to attempt removing any possible ugly and bad contacts of the complex (http://ambermd.org/tutorials/basic/tutorial4b/). For the energy minimization step, 3Dclick first uses AmberTools22 to generate parameter and coordinate files for the ligand using General AMBER Force Field (GAFF2) (He et al., J. Chem. Phys. , 2020). A total of 200 steps of energy minimization with the first 50 of steepest descent and the remainder 150 of conjugate gradient is applied using the popular generalized Born (GB) implicit-solvent model of Hawkins et al. (Hawkins et al., J. Phys. Chem. , 1996) with the cutoff of 16 Å for nonbonded pairs (http://ambermd.org/tutorials/basic/tutorial4b/).

  • [back to top]

    3. Output of 3Dclick

    The list of possible target proteins/hits for the query ligand before energy minimization

      3Dclick displays the output table of 3D structure similarity virtual screening for the binding pocket/site of the ligand ATP (Adenosine triphosphate) from the structure of A.fulgidus Rio2 Kinase (PDB code: 1zaoA) on the uploaded database (as shown in the Example 7). The target proteins are rank ordered by Z-score. Here, 3Dclick identifies the list of possible target proteins with significant Z-scores (Z-score ≥2). In this example, there are nine possible target proteins with Z-score ≥2 from PDB (Protein Data Bank) and AlphaFold structures at EBI (https://alphafold.ebi.ac.uk/). The summary of target protein is shown when users click on each Target Structure in the output table. For each of these possible target protein, 3Dclick also identifies Structure Overlap, RMSD, Number of Identical Residues, Number of Residues of Target Protein, as well as Number of Good Contacts, Number of Bad Contacts, and Number of Ugly Contacts between the ligand and the target protein in their complex. For each of target protein, 3Dclick constructs the complex model with the ligand in its new putative binding site. In addition, 3Dclick displays the visualization of 3D structures for each complex using NGL (https://nglviewer.org/) when users click on each "3D view" of Complex Structure in the output table.

    Visualization of the complex structure of ligand and possible target protein before energy minimization

    • Complex structure before energy minimization
    • 3Dclick displays the visualization of 3D structures for the complex of ligand ATP and possible target protein of human AlphaFold structure of Obscurin (AF-Q5VST9-F32 with 1400 residues) using NGL (https://nglviewer.org/) when users click on "3D view" of the possible target AF-Q5VST9-F32 in the output table of Example 7. As shown in the output table, the human AlphaFold structure of Obscurin (AF-Q5VST9-F32) is identified as the target protein of ATP that has similarity scores with the ATP binding site of A.fulgidus Rio2 Kinase (Z-score = 5.16; Structure Overlap = 84.78%; RMSD = 1.91 Å). In this visualization, the ligand ATP is shown in sticks while the target protein of human AlphaFold structure of Obscurin is shown in ribbons.

    • Complex structure with contact residues before energy minimization
    • 3Dclick displays the visualization of 3D structures for the complex of ligand ATP and possible target protein of human AlphaFold structure of Obscurin (AF-Q5VST9-F32 with 1400 residues) and their contact residues (within 5 Å) using NGL (https://nglviewer.org/) when users click on "Contact Residues ON/OFF" button of the visualization of AF-Q5VST9-F32 in the output table of Example 7. As shown in the output table, the complex of ligand ATP and target protein of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) has the number of Good Contacts = 130, the number of Bad Contacts = 15, and the number of Ugly Contacts = 11 before energy minimization. In this visualization, 3Dclick shows an ugly contact between O1B atom of ATP and OD2 atom of ASP407 of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) with the distance = 1.95 Å and Contact score = 0.66 before energy minimization.

    The list of possible target proteins/hits for the query ligand after energy minimization

      When users click on the link of "Energy minimization step for the top hits and ligand" of the output page of 3D structure similarity virtual screening for the binding pocket/site of the ligand ATP from the structure of A.fulgidus Rio2 Kinase (PDB code: 1zaoA) on the uploaded database (as shown in the Example 7), 3Dclick displays the output table of possible target proteins that have been successfully completed the energy minimization step and have Number of Ugly Contacts = 0. In this example, after the energy minimization step there are eight possible target proteins with their Number of Ugly Contacts = 0 from PDB (Protein Data Bank) and AlphaFold structures at EBI (https://alphafold.ebi.ac.uk/). Compared to the output table before energy minimization, there is only one target protein of Oxygenase-reductase (PDB code: 4ospA with Number of Ugly Contacts before Minimization = 66) that has not been completed the energy minimization step. The summary of target protein is shown when users click on each Target Structure in the output table. For each of these possible target protein, 3Dclick also identifies Number of Good Contacts after Minimization, Number of Bad Contacts after Minimization, and Number of Ugly Contacts after Minimization as well as Energy before Minimization and Energy after Minimization between the ligand and the target protein in their complex after the energy minimization. In addition, 3Dclick displays the visualization of 3D structures for each complex after energy minimization using NGL (https://nglviewer.org/) when users click on each "3D view" of Complex Structure in the output table.

    Visualization of the complex structure of ligand and possible target protein after energy minimization

    • Complex structure after energy minimization
    • When users click on "3D view" of the possible target AF-Q5VST9-F32 in the output table after the energy minimization step of Example 7, 3Dclick displays the visualization of 3D structures for the complex of ligand ATP and possible target protein of human AlphaFold structure of Obscurin (AF-Q5VST9-F32 with 1400 residues) after energy minimization using NGL (https://nglviewer.org/). In this visualization, the ligand ATP is shown in sticks while the target protein of human AlphaFold structure of Obscurin is shown in ribbons.

    • Complex structure with contact residues after energy minimization
    • When users click on "Contact Residues ON/OFF" button of the visualization of AF-Q5VST9-F32 in the output table after energy minimization of Example 7, 3Dclick displays the visualization of 3D structures for the complex of ligand ATP and possible target protein of human AlphaFold structure of Obscurin (AF-Q5VST9-F32 with 1400 residues) and their contact residues (within 5 Å) after energy minimization using NGL (https://nglviewer.org/). As shown in the output table after energy minimization, the complex of ligand ATP and target protein of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) has the number of Good Contacts after energy minimization = 100, the number of Bad Contacts after energy minimization = 1, and the number of Ugly Contacts after energy minimization = 0 compared to the number of Bad Contacts before energy minimization = 15, and the number of Ugly Contacts before energy minimization = 11. In this visualization, 3Dclick converts an ugly contact between O1B atom of ATP and OD2 atom of ASP407 of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) before energy minimization to a contact with the distance = 3.91 Å and Contact score = 1.32 after energy minimization.

      Since the energy minimization step usually takes a long running time (for example, the running time of energy minimization step for the target of human AlphaFold structure of Obscurin (AF-Q5VST9-F32 with 1400 residues) with ATP using the binding site of A.fulgidus Rio2 Kinase is 13 minutes while 3Dclick only took 2 seconds for 3D comparison of these two structures), our 3Dclick web server currently shortlists the top 100 hits/target proteins of significant Z-score (Z-score ≥2) and uses them for the energy minimization step. Please connect with us, if users would like to have larger possible target proteins for their ligands/small molecules or drugs.

    [back to top]

    4. Mouse controls for 3Dclick visualization

    Displaying atom name

      By moving the mouse pointer over any atom in the visualization window of 3Dclick, the corresponding atom name is shown. As seen in the above figure, the name of sidechain atom OD2 of ASP407 of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) is displayed.

    Measuring distance between two atoms

    • Selecting two atoms
    • Right click on the first atom. A green sphere wrapping the first atom is shown.
    • Right click on the second atom. Two green spheres wrapping these two atom are shown. As seen in the above figure, the O1B atom of ATP and OD2 atom of ASP407 of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) are selected

    • Calculating the distance of two selected atoms
    • After selecting the two atoms, in order to measure the distance between two atoms, please Right click again on the second selected atom. The green dashed line between the two atoms along with their distance in Å are shown. As displayed in the above figure, the distance between O1B atom of ATP and OD2 atom of ASP407 of human AlphaFold structure of Obscurin (AF-Q5VST9-F32) is 1.95 Å.

    Moving, rotating and zoom in/out the complex structure of ligand and target protein

      To move, rotate, and zoom in/out the complex structure of ligand and target protein in the visualization window of 3Dclick, please use the following mouse button:

    • Moving: right click + drag
    • Rotating: left click + drag
    • Z-axis rotation: Ctrl + right click + drag
    • Zoom in/out: scroll wheel
    • Center view: left click on the desired atom

    [back to top]

    5. Running time

    The running time increases with increase in:

  • size of database
  • number of protein-ligand binding residues
  • number of best matched cliques for each comparison
  • Below table lists running times of the query ligand ATP (Adenosine triphosphate) at the position of 286 chain A from the structure of A.fulgidus Rio2 Kinase with ATP (PDB code: 1zaoA chain A) using different databases. Note that these are estimated running times. Jobs may also queue up on the server.

    Database Small SCOP fold representatives SCOP families (2868 PDB chains) Protein (32,647 PDB chains NR 95) E.coli AlphaFold (4363 structures) S.cerevisiae AlphaFold (6039 structures) Human AlphaFold (23,391 structures)
    Running time 4 minutes 40 seconds 15 minutes 20 seconds 1 hour 23 minutes 35 minutes 44 seconds 1 hour 1 minute 1 hour 44 minutes

    6. Browser compatibility

    OS
    Version
    Chrome
    Firefox
    Microsoft Edge
    Safari
    MacOS
    Ventura
    109.0.5414.119
    109.0
    n/a
    16.3
    Windows
    11
    109.0.5414.120
    108.0.2
    109.0.1518.61
    n/a
    Ubuntu
    22.04.1
    n/a
    103.0.1
    n/a
    n/a

    [back to top]