Welcome to Bio3D-web

- Online analysis of user defined protein structure ensembles -

1. SEARCH tab
Identify and select PDB structures related to your input query protein.

2. ALIGN tab
Align selected PDB structures and explore sequence similarity and conservation.

3. FIT tab
Superimpose selected structures and explore their invariant structural core and conformational differences.

4. PCA tab
Explore the major conformational features, inter-conformer relationships and structural variability of the selected ensemble.

5. eNMA tab
Compare the predicted large-scale motions and local flexibility of all user selected structures.


Overview

Bio3D-web is a new online application for the user friendly investigation of protein structure ensembles.

Major functionality allows you to map and explore the structural, conformational and internal dynamic properties of proteins for which there are high resolution structures available. Read More>>


Tutorial (PDF)

Example Applications

Heterotrimeric G-proteins
Heterotrimeric G-protein alpha-subunits undergo cycles of nucleotide-dependent conformational rearrangements to couple cell surface receptors to downstream effectors and signaling cascades that control diverse cellular processes. Important conformational transitions occurring at each stage of this cycle have been characterized from extensive crystallographic studies.

Principal component analysis (PCA) using Bio3D-web of 53 available G-alpha crystallographic structures identifies three major conformationally distinct states. These correspond to active GTP - analogue (black), inactive GDP (red) and inhibited GDI (green) bound structures.

Ensemble normal mode analysis (eNMA) reveals areas of conserved dynamics interspersed with key areas of significantly distinct flexibilities in the different states. Specifically, the P-loop and switch I, switch II and switch III regions are predicted to be significantly more flexible in the inactive state.

Yao et al. (2016)

GroEL
GroEL is a molecular chaperone that aids in the folding of a wide range of essential proteins in bacteria. ATP-dependent allosteric transitions and large-scale conformational changes are thought to underlie the GroEL stimulated folding process.

Using Bio3D-web we can readily identify, collect and analyze over 550 available GroEL subunit structures. Principal component analysis (PCA) clearly shows the relationship between these structures in terms of their major conformational variability. Three major conformational groups are apparent corresponding to apo closed forms (red), open ADP-bound forms (black), and ATP-bound states (green).

Ensemble normal mode analysis (NMA) of representative structures shows distinct fluctuation patterns for the three states with an enhanced mobility of the apical domain of the open state. The motions characterized from NMA displays a high similarity with the conformational change described by the first principal component.

This demonstrates how using Bio3D-web we can effectively rationalize the heterogeneity in large structural sets in a manner that reveals functionally and mechanistically important inter-conformer relationships.

Skjaerven et al. (2011)

Aromatic amino acid hydroxylases
The aromatic amino acid hydroxylases catalyze the hydroxylation of aromatic amino acids L-Phe (phenylalanine hydroxylase; PAH), L-Tyr (tyrosine hydroxylase; TH) and L-Trp (tryptophan hydroxylase; TPH1 and TPH2). Their acitivity is tightly regulated to ensure viable concentrations of substrate and product. Allostery seem to play an important role in this regulation, but the underlying mechanisms remain largely unknown.

A Bio3D-web analysis of structural displacements across this enzyme family reveals a specific molecular mechanism of allostery with origin in the catalytic domains. Principal component analysis (PCA) shows that substrate binding is associated with a sub-domain closing motion over the active site within the catalytic domain. Ensemble normal mode analysis (NMA) indicates that this motion is likely important for regulation of hydroxylation throughout the enzyme family, and provides insight into how signals are transmitted to neighboring subunits.

Skjaerven et al. (2014)

Live Demo's

Kinesin
Kinesin superfamily members play important roles in many diverse cellular processes, including cell motility, cell division, intracellular transport, and regulation of the microtubule cytoskeleton. Here we analyze the superfamily defining kinesin motor domain. These motor domains allosterically couple cycles of ATP hydrolysis to cycles of microtubule binding and conformational changes that result in directional force and movement on microtubules.

Maltose Binding Protein
Maltose-binding protein (MBP) is a bacterial protein involved in nutrient uptake. All MBPs have a characteristic two-domain architecture with a central interdomain ligand-binding cleft. Here we analyze available MBP structures to better understand ligand binding mechanisms in MBPs.

LeuT
Monoamine transporters (MATs) function by coupling ion gradients to the transport of dopamine, norepinephrine, or serotonin. Despite their importance in regulating neurotransmission, the exact conformational mechanism by which MATs function remains elusive. Here we analyze available leucine transporter (LeuT) structures to assess important structural and dynamic features of MATs.

Adenylate kinase
Adenylate kinase (Adk) is a ubiquitous enzyme that functions to maintain the equilibrium between cytoplasmic nucleotides essential for many cellular processes. Adk operates by catalyzing the reversible transfer of a phosphoryl group from ATP to AMP. This reaction is accompanied by a well-studied rate limiting conformational transition of regions that close over the two nucleotide-binding sites. Here we analyze available Adk structures with Bio3D-web to reveal features of this transition.

Frequently Asked Questions

1. What is Bio3D-web and how does it differ from the Bio3D package?

Bio3D-web is a new online web server for the user friendly exploratory analysis of protein sequence-structure-dynamics relationship. Bio3D-web is powered by the well established Bio3D R package for structural bioinformatics (Grant et al. (2006)).

Bio3D-web does not require any installation or programming skills – you explore through an easy to use online interface. This is in contrast to the conventional Bio3D package, which typically requires installation on your own hardware, knowledge of the R-Bio3D language, and use of the unix like command line.


2. Is this just another online NMA server?

No. There are lots of excellent normal mode analysis servers already out there such as WebNM@, elNémo, AD-ENM, ANM and iMODS. These offer NMA calculation capabilities for individual biomolecular structures.

Bio3D-web is unique in providing more expansive and integrated functionality for the identification, comparison and detailed analysis of large homologous structure sets online. We designed Bio3D-web to increase the accessibility and decrease the entry barrier to performing advanced comparative sequence, structure and dynamics analysis across user defined structure sets not just single structures.


3. Why not just use Bio3D proper or ProDy?

Both Bio3D and ProDy (a related python library developed by others) offer much more functionality than is provided in this online application. If you can use these packages productively then we encourage you to do so. One of our main motivations for developing Bio3D-web was to allow new users to be productive with methods like PCA and eNMA without having to first learn Bio3D usage. We hope you will find Bio3D-web useful and then feel motivated to use the conventional Bio3D R-package on your own computers and for your own custom analysis.


4. What is PCA and why is it useful for analysis of biomolecular structures?

Principal component analysis (PCA) is a well established statistical method that is most commonly used as a dimensionally reduction technique for multivariate data analysis - that is input data that has many dimensions, e.g. many different atomic coordinates that have been measured for multiple experimental structures.

Essentially, the PCA performed here aims to succinctly ‘map out’ the conformational relationships in large sets of protein structures and provide quantitate insight into the structural regions that contribute to any distinct conformations identified.

More explicitly, Bio3D-web utilizes PCA to provide a new condensed view of large structural datasets. This condensed view is basically a re-framing that retains the essential essence of the entire coordinate data. The new view is given in terms of what are known as principal components. These principal components are new directions in the data along which there is maximal variance - or more simply put, the directions where our set of structures differ most (i.e. are most spread out). The whole idea of PCA is to find these new directions of maximal variation in the coordinate data and use them to better understand major conformational features of the dataset.

The output of PCA includes a lower dimensional ‘reframing’ of your complete structure set that can simplify visualization and make it easier to uncover and further analyze interesting underlying structure relationships. These relationships can be hard to see in the original coordinate data (e.g. from just looking at superposed structures) because you might have many original dimensions to examine. PCA is also particularly useful as it allows you to qualitatively assess which regions of your structures are contributing most to the revealed structure relationships.

We encourage users that are not familiar with PCA to look into many of the great resources available online (e.g. the Bio3D website).


5. What is eNMA and why is it useful for analysis of biomolecular structures?

Normal modes analysis (NMA) is a computational technique to characterize all possible deformations a protein can undergo. These motions are conveniently sorted with respect to the energy needed to deform the protein along the particular normal mode vector. This computational technique is particularly suited to probe large-scale collective motions typically associated with protein function.

NMA application most often involves the analysis of only a single protein structure. As the normal modes are sensitive to the specific protein conformation for which they are calculated, the exclusion of alternative protein conformations provides only a limited picture of the overall flexibility of the protein under different conditions.

A more complete picture of protein flexibility can be obtained by performing NMA across all structure in an ensemble in a way that facilities the interoperation of structural similarity and dissimilarity trends. This allows a user to explore dynamic trends of all crystalized states in relation to each other without the conventional caveat of potentially over-interpreting the differences between extreme cases or a single artifactual structure. Furthermore, by carefully contrasting the fluctuation profiles one can provide new information on state specific global and local dynamics of potential functional relevance.


6. What are the major limitations of the current version of Bio3D-web?

A major current limitation is our restriction to analyzing only single chains from multi-chain PDB structures. Future versions of Bio3D-web will include the ability to perform single-chain, multi-chain and reconstructed biounit analysis. For now, if you would like to perform this type of analysis you should use the full Bio3D R package and the new biounit() function.

Another limitation is the comparatively slow performance of the ensemble normal mode analysis tab. Note that due to available hardware limitations we currently perform eNMA in series and thus restrict the total number of structures analyzed (even though our underlying code and approach is now paralyzed). Bio3D-web currently runs entirely on a small virtual machine. We plan to improve performance of the eNMA tab to seconds even for many hundreds of structures by linking to suitable cluster computing resources. Please contact us if this is something you would like to do now.

Our 3D structure viewer has limited interactivity and does not render trajectories as movies etc. For example, you can not click on a region of structure and find out what residues are involved in a large-scale motion. We are currently using the inbuilt Bio3D view.pdbs() function via WebGL but are exploring using PV for future versions. In this regard please note that we provide links/buttons to view your superposed structure ensembles, PCA and eNMA results in PyMOL on your own computer. We also provide PDB file download options that should allow you to more comprehensively view your results in other powerful molecular viewers including VMD and Chimera .


7. How is the search for related protein structures performed?

The search is sequence based using pHMMER over the PDB database.


8. What is the 'BitScore of Alignment to Input Structure'?

The bitscore is a score that describes the overall quality of the alignment between the query sequence and the search result. High bitscore corresponds to high sequence similarity. We report the bit scores from the HMMER search.


9. How is the invariant 'structural core' identified?

The algorithm iteratively refines an initial structural superposition determined from a multiple alignment. This involves iterated rounds of superposition, where at each round the position(s) displaying the largest differences is(are) excluded from the dataset (Grant et al. (2006)).


10. Is the source code available and can I run Bio3D-web on my own hardware'?

The complete Bio3D-web source code, like the underlying Bio3D package itself, is made fully available under a GPL2 license. Instructions for running on any computer running R are available here.


11. How are the principal components calculated?

The principal components is calcualted from the superimposed coordinates excluding the gap containing columns (Skjaerven et al. (2014)).


12. How are the normal modes calculated?

The normal modes for each structure in the representative ensemble are calculated according to Fuglebakk et al. (2012) (see also Skjaerven et al. (2014)). The C-alpha force field developed by Konrad Hinsen (Hinsen et al. (2000)) is used for normal mode calculation. For a discussion and comparison of other force-fields and NMA methods see Yao et al. (2016)


13. How to cite Bio3D-web?

If you have used Bio3D-web, please consider citing the following reference that describes this work:

  • Skjaerven et al. Online interactive analysis of protein structure ensembles with Bio3D-web (2016) Bioinformatics 32(22), 3510—3512. doi:10.1093/bioinformatics/btw482
  • Grant et al. Bio3D: An R package for the comparative analysis of protein structures (2006) Bioinformatics 22, 2695-2696
  • Skjaerven et al. Integrating protein structural dynamics and evolutionary analysis with Bio3D. (2014) BMC Bioinformatics 15, 399

14. How to contact us with your questions and suggestions?

Your questions and comments are important to us. If you like or do not like what we are doing, then please get in touch. Also, if there are new features that you would like to have added to the server, then drop us a line and we will see what we can do!

For the moment please use our BitBucket based issues tracker for any questions regarding the use of Bio3D-web or the larger Bio3D software package.

To expedite our response to your questions, please provide us with as much information as possible so that we can recreate the problem. Useful things to include are:

  • State if it is a problem with the Bio3D-web server or a local software installation.
  • Input data sufficient to recreate the problem.
  • Details of which web browser and the version?

Many questions are also covered in the Bio3D documentation, so it may be worth browsing them before posting an issue.

Structure Search

A) Input Structure(s) or Sequence

Please enter either a single PDB code of interest, a single protein sequence, or multiple related PDB codes (see the Help page for more details).

Enter multiple comma ',' separated PDB IDs (4 character RCSB PDB codes with optional underscore chain, e.g. '1KJY_A')
Paste a protein sequence with no identifiers or FASTA headers.

Multiple PDB Summary

Annotation from PFAM of requested multiple PDB structures.

Protein Sequence Summary


                
                

                
                

                

Structure Summary


                


Input Structure Visualization

Note: Use your mouse to drag, rotate and zoom the structure

Input PDB Read Log


              

Overview

This site provides an online interface to several Bio3D tools for comparative protein structure analysis.

Methods include (1) Searching for related structures, (2) Alignment of selected structures, (3) Fitting based on rigid core positions, (4) PCA (principal component analysis) for inter-conformer characterization and (5) eNMA (ensemble normal mode analysis) for additional structure dynamic characterization.

Each of these analysis steps is implemented as consecutive tabs accessible from the top navigation bar.

Start your analysis by entering a PDB code of interest and then proceed by navigating through the above tabs or following the NEXT buttons.

For a detailed usage guide please download and consult the Bio3D-web tutorial [ PDF ].


B) Hit selection for further analysis

Optional filtering and refinement (via similarity threshold specification) of related structures for further analysis.

BLAST options not available for multiple PDB input


C) Optional filtering of related structures for further analysis

Optionally select (or de-select) structures by clicking to highlight their entries in the below table. This allows for finer grained selection than the sliders in panel B above.

Structure chain annotation

Input structures may consist of multiple chains. If not explicitly specified in panel A above (via using underscore chain in the input identifier e.g. '1KJY_A') all of these chains will be included below and be subject to further analysis.

Sequence Alignment

A) Alignment summary

Aligned positions are shown in gray with unaligned gap regions in white. The top red bar indicates the level of sequence conservation per position (see the Help page for further details).

Edit alignment

Individual structures can be filtered (i.e. removed) from the alignment by deleting their identifier from the list below. Alternatively, the alignment itself can be downloaded, edited and re-loaded via the Upload option.
Download FASTA file To edit the alignment, download the FASTA alignment file, and upload in the box to the right.
Download FASTA alignment file
Upload FASTA file Important: when editing the FASTA file, do not edit the sequence identifiers, and do preserve the amino acid sequences.

B) Sequence alignment analysis

Sequence identity clustering results.
For details of the clustering methods implemented see the Details section of the hclust() man page.

C) Residue conservation

Sequence conservation per residue position.

Download Figures and Data

Conservation (PDF)

D) Optional sequence alignment display

Rendering the alignment for display can be time consuming for large data sets (e.g. > 50 PDB structures of substantial sequence length).


Structure Superposition

A) Superposed PDB viewing options

From blue N-terminus to red C-terminus of the alignment.
From blue low variability to red most variable positions.
See RMSD cluster group colors assigned in panel B below.
Red structurally invariant core positions.
Red gap and black aligned positions.
Purple alpha helix and yellow beta strand.
Each individual molecule is assigned a distinct color.
Download options:
Aligned PDBs PyMOL Session

Filter/toggle displayed PDBs

Deselected entries in the table below will be hidden from the structure view in panel A above.

B) Initial structure analysis

For details of the clustering methods implemented see the Details section of the hclust() man page.

C) Residue fluctuations

Download PDF Figures

RMSF (PDF)

D) Structural core and RMSD details

Summary of invariant core

RMSD summary

Cluster representatives

Principal Component Analysis

A) Principal component visualization

End-point conformations of the PC trajectory are colored blue and red with interpolated intermediates gray.
From blue low variability to red most variable positions.
Interpolated conformations colored by frame number from blue through gray to red.
Download options:
PDB Trajectory PyMOL Session

B) Conformer analysis



Label options

PCA conformer plot annotation

Highlight structures in conformer plot by clicking their entries in the below table (only for plot type '2D Scatter').

C) Residue contributions

Download PDF Figures

Residue contributions (PDF)

Ensemble Normal Mode Analysis

A) Filter structures

Focus the structure ensemble by filtering out similar structures to reduce the computational load for NMA. Interactively (de)select structures by clicking the PC conformer plot.
(Dots highlighted with circles in the PC conformerplot will be included in the calculation).
(PDB IDs colored red in dendrogram will be included in the calculation).


B) Normal Modes Visualization


PDB Trajectory

C) Residue fluctuations

Fluctuations (PDF)

D) Compare PCA and NMA

The NMs of the selected 'reference' PDB are compared to the PCs derived from the ensemble of PDB structures. Note: Only the filtered structures are used in the comparison (panel A).
NMA-vs-PCA overlap (PDF)

E) Overlap analysis

The NMs of the selected 'reference' PDB are compared to the vector describing the difference in conformation between two selected structures (i.e. the reference structure and each of the selected PDBs from the table below).
Overlap (PDF)

F) Cluster dendrogram

Dendrogram (PDF)

Cluster heatmaps

Row side colors correspond to RMSIP clustering for all plots below. Column side colors correspond to clustering from NMA, FIT, and PCA tab, respectively.
RMSIP heatmap
RMSD heatmap
PCA heatmap

Download Options

Select a format for your summary report and click the Download button to save to your computer.
Download

Save or load your analysis

I want to:

Save your analysis

The results from your analyses will be saved on the server for 30 days.


              
              


Load your analysis

You can revisit your data by pasting the session ID in the box below.

Example: 2016-10-01_65d0f48c17e