This week is all about proteins! The homework is divided into two parts. Part A is focused in protein analysis and protein informatics. In part B, you will have a fun introduction to the challenging world of protein folding.
In this part of the homework, you will be using online resources and 3D visualization software to answer questions about proteins.
Answer any of the following questions
Pick any protein (from any organism) of your interest that has a 3D structure and answer the following questions.
Briefly describe the protein you selected and why you selected it.
Identity the amino acid sequence of your protein.
How long is it? What is the most frequent amino acid?
How many protein sequence homologs are there for your protein?
Hint: Use the pBLAST tool to search for homologs and ClustalOmega to align and visualize them.
Does your protein belong to any protein family?
Identify the structure page of your protein in RCSB
Open the structure of your protein in any 3D molecule visualization software
PyMOL(https://pymol.org/edu/?q=educational): PyMOL is a user-sponsored molecular visualization system on an open-source foundation, maintained and distributed by Schrödinger.
Chimera: A highly extensible program for interactive visualization and analysis of molecular structures and related data, including density maps, supramolecular assemblies, sequence alignments, docking results, trajectories, and conformational ensembles.
VMD: A molecular visualization program for displaying, animating, and analyzing large biomolecular systems using 3-D graphics and built-in scripting
NGLViewer: NGL Viewer is a collection of tools for web-based molecular graphics. WebGL is employed to display molecules like proteins and DNA/RNA with a variety of representations.
BLAST: BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences (pBLAST) to sequence databases and calculates the statistical significance.
Clustal Omega: A new multiple sequence alignment program that uses seeded guide trees and HMM profile-profile techniques to generate alignments between three or more sequences.
The BLOSUM matrices are used during alignment to check how similar are amino acids to each other. Here is the BLOSUM62 matrix, most commonly used if no a-priori information is available for the evolutionary relationship of the protein sequences.
In this part you will be folding protein sequences into 3D structures. The goal is to get an understanding on how computational protein modeling works as well as to see first hand the great computing power needed for molecular simulations in biology.
For questions 1 and 2 you will be using the Python version of the Rosetta protein structure prediction software, while for question 3 (extra credit) you can use any of the available software listed in the resources.
The files for this exercise are available to clone or download from the followign GitHub repository: https://github.com/thrakar9/protein_folding_workshop.
Folding a small (30 aa) peptide. Follow the "Setting up PyRosetta" instructions below and make sure you have a working PyRosetta installation.
a. Open the "Protein Folding with Pyrosetta" Jupyter notebook. Execute interactively the code in the notebook and answer the questions therein. When you are done, save the notebook (with the answers and all outputs) to an HTML file, and link it to your class page.
b. Pick the lowest energy model and structurally (visually) compare it to the native. How close is it to the native? If its different, what parts did the computer program get wrong? Note: To compare the structures you have first to align them to the native. You can do that very easily in PyMOL. Here is a short video tutorial on aligning structures with PyMOL
c. Pick the lowest RMSD model and structurally compare it to the native. How close is it to the native? If its different than the lowest energy model, how is it different? Remember that in a blind case, we will not have the benefit of an RMSD column.
Fold your own sequence! In question 1 we used the sequence from a human protein as input to the folding algorithm. Yet, in principle, you can give any arbitrary sequence of amino acids as an input.
a. Use any process to create a sequence of 30-50 amino acids, and predict it's 3D structure using the notebook from Q1. You can try to run the script with multiple parameter combinations and compare the results. Log the parameters that had the best outcome.
b. Compare the resulting structures of 2(a) with those from question 1. Do the structures in both cases look protein-like ? If not, can you think of an explanation?
c. Try folding multiple sequences to come up with the most protein-looking structure!
Folding protein homologs (extra credit) For this exercise you will be running multiple protein folding simulations. If you don't have access to a powerful machine, use any of the folding servers listed in the resources.
a. Take the protein sequence from question 1 and randomly change 5 letters to any other amino acid. Predict the protein structure of the unedited (probably done already in Q.1) and edited protein and compare the results. Did the changes you introduced changed the structure significantly?
b. Take again the original sequence from Q.1 and now change 5 letters to favorable alternatives according to the BLOSUM matrix. Predict the protein structure for the new sequence and compare with the results of 3(a). Did the new changes have the same effect to the structure?
c. By using the BLOSUM matrix as a guide, try to introduce as many changes as possible to the protein sequence, without significantly changing it's structure.
Download and install Anaconda.
Create a Python 3.6.8 virtual environment with conda
conda create -n protein_design python=3.6.8
python
. You should an output similar to this:xxxxxxxxxx
Python 3.6.8 | Anaconda, Inc. | (default, Dec 29 2018, 19:04:46)
[GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Download and install PyRosetta
Select the Python 3.6 version for your system
Download. Username: levinthal
and Password paradox
Note: This combination of username/password is only for academic use.
Activate the virtual environment we created above: conda activate protein_design
Extract and install PyRosetta to the environment.
xxxxxxxxxx
tar -vjxf PyRosetta-<version>.tar.bz2
cd setup && python3.6 setup.py install
xxxxxxxxxx
python -c "import pyrosetta; pyrosetta.init()"
Jupyter Notebooks are simply amazing. If you haven't used them before, today is your lucky day. Some resources:
File->Download as->HTML
)