This notebook contains material from PyRosetta; content is available on Github.

Score Function Basics

Keywords: score function, ScoreFunction(), get_score_function(), set_weight(), show(), etable_atom_pair_energies(), Atom objects, get_hbonds(), nhbonds(), residue_hbonds()

In [ ]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

import pyrosetta
pyrosetta.init()

Make sure you are in the directory with the pdb files:

cd google_drive/My\ Drive/student-notebooks/

In this module, we will explore the PyRosetta score function interface. You will learn to inspect energies of a biomolecule at the whole protein, per-residue, and per-atom level. Finally, you will gain practice applying the energies to answering biological questions involving proteins. For these exercises, we will use the protein Ras (PDB 6q21). Either make sure you have the PDB file "6Q21_A.pdb" in your current directory, or if you have an Internet connection, load the pdb into a pose called ras with the pyrosetta.pose_from_pdb method.

In [1]:
### BEGIN SOLUTION
ras = pyrosetta.pose_from_pdb("inputs/6Q21_A.pdb")
### END SOLUTION
core.init: Checking for fconfig files in pwd and ./rosetta/flags
core.init: Rosetta version: PyRosetta4.Release.python36.mac r208 2019.04+release.fd666910a5e fd666910a5edac957383b32b3b4c9d10020f34c1 http://www.pyrosetta.org 2019-01-22T15:55:37
core.init: command: PyRosetta -ex1 -ex2aro -database /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database
core.init: 'RNG device' seed mode, using '/dev/urandom', seed=-331629677 seed_offset=0 real_seed=-331629677
core.init.random: RandomGenerator:init: Normal mode, seed=-331629677 RG_type=mt19937
core.chemical.GlobalResidueTypeSet: Finished initializing fa_standard residue type set.  Created 696 residue types
core.chemical.GlobalResidueTypeSet: Total time to initialize 1.22777 seconds.
core.import_pose.import_pose: File '6Q21_A.pdb' automatically determined to be of type PDB

To score a protein, you will begin by defining a score function using the get_score_function(is_fullatom: bool) method in the pyrosetta.teaching namespace. Specifying True will return the default ref2015 all-atom energy function, while False will specify the default centroid score function.

Create a PyRosetta score function using:

sfxn = get_score_function(True)
In [3]:
from pyrosetta.teaching import *

### BEGIN SOLUTION
sfxn = get_score_function(True)
### END SOLUTION
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015
core.scoring.etable: Starting energy table calculation
core.scoring.etable: smooth_etable: changing atr/rep split to bottom of energy well
core.scoring.etable: smooth_etable: spline smoothing lj etables (maxdis = 6)
core.scoring.etable: smooth_etable: spline smoothing solvation etables (max_dis = 6)
/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/teaching.py:13: UserWarning: Import of 'rosetta' as a top-level module is deprecated and may be removed in 2018, import via 'pyrosetta.rosetta'.
  from rosetta.core.scoring import *
core.scoring.etable: Finished calculating energy tables.
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBPoly1D.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBFadeIntervals.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/HBEval.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/DonStrength.csv
basic.io.database: Database file opened: scoring/score_functions/hbonds/ref2015_params/AccStrength.csv
basic.io.database: Database file opened: scoring/score_functions/rama/fd/all.ramaProb
basic.io.database: Database file opened: scoring/score_functions/rama/fd/prepro.ramaProb
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.all.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.gly.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.pro.txt
basic.io.database: Database file opened: scoring/score_functions/omega/omega_ppdep.valile.txt
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/P_AA_n
core.scoring.P_AA: shapovalov_lib::shap_p_aa_pp_smooth_level of 1( aka low_smooth ) got activated.
basic.io.database: Database file opened: scoring/score_functions/P_AA_pp/shapovalov/10deg/kappa131/a20.prop

You can see the terms, weights, and energy method options by printing the score function:

print(sfxn)
In [4]:
### BEGIN SOLUTION
print(sfxn)
### END SOLUTION
ScoreFunction::show():
weights: (fa_atr 1) (fa_rep 0.55) (fa_sol 1) (fa_intra_rep 0.005) (fa_intra_sol_xover4 1) (lk_ball_wtd 1) (fa_elec 1) (pro_close 1.25) (hbond_sr_bb 1) (hbond_lr_bb 1) (hbond_bb_sc 1) (hbond_sc 1) (dslf_fa13 1.25) (omega 0.4) (fa_dun 0.7) (p_aa_pp 0.6) (yhh_planarity 0.625) (ref 1) (rama_prepro 0.45)
energy_method_options: EnergyMethodOptions::show: aa_composition_setup_files: 
EnergyMethodOptions::show: mhc_epitope_setup_files: 
EnergyMethodOptions::show: netcharge_setup_files: 
EnergyMethodOptions::show: aspartimide_penalty_value: 25
EnergyMethodOptions::show: etable_type: FA_STANDARD_DEFAULT
analytic_etable_evaluation: 1
EnergyMethodOptions::show: method_weights: ref 1.32468 3.25479 -2.14574 -2.72453 1.21829 0.79816 -0.30065 2.30374 -0.71458 1.66147 1.65735 -1.34026 -1.64321 -1.45095 -0.09474 -0.28969 1.15175 2.64269 2.26099 0.58223
EnergyMethodOptions::show: method_weights: free_res
EnergyMethodOptions::show: unfolded_energies_type: UNFOLDED_SCORE12
EnergyMethodOptions::show: split_unfolded_label_type: SPLIT_UNFOLDED_MM
EnergyMethodOptions::show: split_unfolded_value_type: SPLIT_UNFOLDED_BOLTZ
EnergyMethodOptions::show: atom_vdw_atom_type_set_name: centroid
EnergyMethodOptions::show: exclude_protein_protein_fa_elec: false
EnergyMethodOptions::show: exclude_RNA_RNA_fa_elec: false
EnergyMethodOptions::show: exclude_RNA_protein_fa_elec: false
EnergyMethodOptions::show: exclude_monomer_fa_elec: false
EnergyMethodOptions::show: elec_max_dis: 5.5
EnergyMethodOptions::show: elec_min_dis: 1.6
EnergyMethodOptions::show: elec_die: 10
EnergyMethodOptions::show: elec_no_dis_dep_die: false
EnergyMethodOptions::show: elec_sigmoidal_die: true
EnergyMethodOptions::show: elec_sigmoidal_D: 80
EnergyMethodOptions::show: elec_sigmoidal_D0: 6
EnergyMethodOptions::show: elec_sigmoidal_S: 0.4
EnergyMethodOptions::show: smooth_fa_elec: true
EnergyMethodOptions::show: grpelec_fade_type: false
EnergyMethodOptions::show: grpelec_fade_param1: 1
EnergyMethodOptions::show: grpelec_fade_param2: 1
EnergyMethodOptions::show: grpelec_fade_hbond: 0
EnergyMethodOptions::show: grp_cpfxn: 1
EnergyMethodOptions::show: elec_group_file: /scoring/score_functions/elec_group_def.dat
EnergyMethodOptions::show: grpelec_context_dependent: 0
EnergyMethodOptions::show: use_polarization: true
EnergyMethodOptions::show: use_gen_kirkwood: true
EnergyMethodOptions::show: protein_dielectric: 1
EnergyMethodOptions::show: water_dielectric: 78.3
EnergyMethodOptions::show: exclude_DNA_DNA: true
EnergyMethodOptions::show: exclude_intra_res_protein: false
EnergyMethodOptions::show: count_pair_hybrid: false
EnergyMethodOptions::show: put_intra_into_total: false
EnergyMethodOptions::show: geom_sol_interres_path_distance_cutoff: false
EnergyMethodOptions::show: geom_sol_intrares_path_distance_cutoff: true
EnergyMethodOptions::show: eval_intrares_elec_ST_only: false
EnergyMethodOptions::show: envsmooth_zero_negatives: false
EnergyMethodOptions::show: cst_max_seq_sep: 18446744073709551615
EnergyMethodOptions::show: pb_bound_tag: bound
EnergyMethodOptions::show: pb_unbound_tag: unbound
EnergyMethodOptions::show: ordered_wat_penalty: 1.221
EnergyMethodOptions::show: ordered_pt_wat_penalty: 2.709
EnergyMethodOptions::show: voids_penalty_energy_containing_cones_cutoff_:6
EnergyMethodOptions::show: voids_penalty_energy_cone_distance_cutoff_: 8
EnergyMethodOptions::show: voids_penalty_energy_cone_dotproduct_cutoff_: 0.1
EnergyMethodOptions::show: voids_penalty_energy_voxel_grid_padding_: 1
EnergyMethodOptions::show: voids_penalty_energy_voxel_size_: 0.5
EnergyMethodOptions::show: voids_penalty_energy_disabled_except_during_packing_: TRUE
EnergyMethodOptions::show: hbnet_bonus_ramping_function_: "quadratic"
EnergyMethodOptions::show: hbnet_max_network_size_: 0
EnergyMethodOptions::show: approximate_buried_unsat_penalty_hbond_energy_threshold_: -0.25
EnergyMethodOptions::show: approximate_buried_unsat_penalty_burial_atomic_depth_: 4.5
EnergyMethodOptions::show: approximate_buried_unsat_penalty_burial_probe_radius_: 2.3
EnergyMethodOptions::show: approximate_buried_unsat_penalty_burial_resolution_: 0.5
EnergyMethodOptions::show: approximate_buried_unsat_penalty_oversat_penalty_: 1
EnergyMethodOptions::show: approximate_buried_unsat_penalty_assume_const_backbone_:1
EnergyMethodOptions::show: bond_angle_central_atoms_to_score:
EnergyMethodOptions::show: bond_angle_residue_type_param_set: none
HBondOptions::show: exclude_DNA_DNA: true
HBondOptions::show: exclude_intra_res_protein_: false
HBondOptions::show: exclude_intra_res_RNA_: false
HBondOptions::show: put_intra_into_total_: false
HBondOptions::show: exclude_self_hbonds: true
HBondOptions::show: use_hb_env_dep: false
HBondOptions::show: use_hb_env_dep_DNA: true
HBondOptions::show: smooth_hb_env_dep: true
HBondOptions::show: bb_donor_acceptor_check: true
HBondOptions::show: decompose_bb_hb_into_pair_energies: false
HBondOptions::show: params_database_tag_: ref2015_params
HBondOptions::show: use_sp2_chi_penalty_: true
HBondOptions::show: sp2_BAH180_rise_: 0.75
HBondOptions::show: sp2_outer_width_: 0.357
HBondOptions::show: measure_sp3acc_BAH_from_hvy_: true
HBondOptions::show: fade_energy_: 1
HBondOptions::show: exclude_ether_oxygens_: 0
HBondOptions::show: Mbhbond: false 
HbondOptions::show: mphbond: false
HBondOptions::show: hbond_energy_shift: 0
HBondOptions::show: water_hybrid_sf: false
RNA_EnergyMethodOptions::show: syn_G_potential_bonus: 0
RNA_EnergyMethodOptions::show: torsion_potential: ps_04282011
RNA_EnergyMethodOptions::show: suiteness_bonus: Richardson
RNA_EnergyMethodOptions::show: rna_base_pair_xy_filename: scoring/rna/rna_base_pair_xy.dat
FreeDOF_Options::show: free_suite_bonus: -1
FreeDOF_Options::show: free_2HOprime_bonus: -0.5
FreeDOF_Options::show: free_sugar_bonus: -1
FreeDOF_Options::show: pack_phosphate_penalty: 0.25
FreeDOF_Options::show: free_side_chain_bonus: -0.5


Practice: List the terms in the energy function and their relative weights

Hint: look at the top line that starts with 'weights'

In [5]:
## Your Response Here

### BEGIN SOLUTION

# (fa_atr 1) (fa_rep 0.55) (fa_sol 1) (fa_intra_rep 0.005)
# (fa_intra_sol_xover4 1) (lk_ball_wtd 1) (fa_elec 1)
# (pro_close 1.25) (hbond_sr_bb 1) (hbond_lr_bb 1)
# (hbond_bb_sc 1) (hbond_sc 1) (dslf_fa13 1.25) (omega 0.4)
# (fa_dun 0.7) (p_aa_pp 0.6) (yhh_planarity 0.625) (ref 1)
# (rama_prepro 0.45)

### END SOLUTION

Custom energy functions

You can also create a custom energy function that includes select terms. Typically, creating a whole new score function is unneccesary because the current one works well in most cases. However, tweaking the current enrgy function by reassigning weights and adding certain energy terms can be useful.

Here, we will make an example energy function with only the van der Waals attractive and repulsive terms, both with weights of 1. We need to use the set_weight(). Make a new ScoreFunction and set the weights accordingly. This is how we set the full-atom attractive (fa_atr) and the full-atom repulsive (fa_rep) terms.

sfxn2 = ScoreFunction()
sfxn2.set_weight(fa_atr, 1.0)
sfxn2.set_weight(fa_rep, 1.0)
In [6]:
### BEGIN SOLUTION
sfxn2 = ScoreFunction()
sfxn2.set_weight(fa_atr, 1.0)
sfxn2.set_weight(fa_rep, 1.0)
### END SOLUTION

Lets compare the score of ras using the full-atom ScoreFunction versus the ScoreFunction we made above using only the attractive and repulsive terms.

First, print the total energy of ras using print(sfxn(ras)) Then, print the attractive and repulsive energy only of ras using print(sfxn2(ras))

In [6]:
# print total energy of ras
### BEGIN SOLUTION
print(sfxn(ras))
### END SOLUTION

# print the attractive and repulsive energy of ras
### BEGIN SOLUTION
print(sfxn2(ras))
### END SOLUTION
basic.io.database: Database file opened: scoring/score_functions/elec_cp_reps.dat
core.scoring.elec.util: Read 40 countpair representative atoms
core.pack.dunbrack.RotamerLibrary: shapovalov_lib_fixes_enable option is true.
core.pack.dunbrack.RotamerLibrary: shapovalov_lib::shap_dun10_smooth_level of 1( aka lowest_smooth ) got activated.
core.pack.dunbrack.RotamerLibrary: Binary rotamer library selected: /Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin
core.pack.dunbrack.RotamerLibrary: Using Dunbrack library binary file '/Users/kathyle/Computational Protein Prediction and Design/PyRosetta4.Release.python36.mac.release-208/pyrosetta/database/rotamer/shapovalov/StpDwn_0-0-0/Dunbrack10.lib.bin'.
core.pack.dunbrack.RotamerLibrary: Dunbrack 2010 library took 0.450999 seconds to load from binary
1215.729069796814
154.59159174026854

Energy Breakdown

Using the full-atom ScoreFunction sfxn, break the energy of ras down into its individual pieces with the sfxn.show(ras) method. Which are the three most dominant contributions, and what are their values? Is this what you would have expected? Why? Note which terms are positive and negative

In [7]:
# use the sfxn.show() method
### BEGIN SOLUTION
sfxn.show(ras)
### END SOLUTION
core.scoring: 
------------------------------------------------------------
 Scores                       Weight   Raw Score Wghtd.Score
------------------------------------------------------------
 fa_atr                       1.000   -1039.246   -1039.246
 fa_rep                       0.550    1193.837     656.611
 fa_sol                       1.000     682.582     682.582
 fa_intra_rep                 0.005     700.419       3.502
 fa_intra_sol_xover4          1.000      46.564      46.564
 lk_ball_wtd                  1.000     -14.597     -14.597
 fa_elec                      1.000    -195.387    -195.387
 pro_close                    1.250      97.210     121.513
 hbond_sr_bb                  1.000     -41.656     -41.656
 hbond_lr_bb                  1.000     -28.352     -28.352
 hbond_bb_sc                  1.000     -13.111     -13.111
 hbond_sc                     1.000      -7.771      -7.771
 dslf_fa13                    1.250       0.000       0.000
 omega                        0.400      41.525      16.610
 fa_dun                       0.700    1296.642     907.650
 p_aa_pp                      0.600     -25.496     -15.298
 yhh_planarity                0.625       0.000       0.000
 ref                          1.000      47.114      47.114
 rama_prepro                  0.450     197.781      89.002
---------------------------------------------------
 Total weighted score:                     1215.729
In [8]:
# Your response here: what are the three most dominant contributions?

Unweighted, individual component energies of each residue in a structure are stored in the Pose object and can be accessed by the energies() method. For example, to break down the energy into each residue's contribution, we use:

print(ras.energies().show(<n>))

Where <n> is the residue number.

What are the total van der Waals, solvation, and hydrogen-bonding contributions of residue 24?

Note: The backbone hydrogen-bonding terms for each residue are not available from the Energies object. You can get them by using EnergyMethodOptions. See http://www.pyrosetta.org/documentation#TOC-Hydrogen-Bonds-and-Hydrogen-Bond-Scoring.

In [9]:
### BEGIN SOLUTION
print(ras.energies().show(24))
### END SOLUTION
core.scoring.Energies: E               fa_atr        fa_rep        fa_sol  fa_intra_repfa_intra_sol_x   lk_ball_wtd       fa_elec     pro_close   hbond_sr_bb   hbond_lr_bb   hbond_bb_sc      hbond_sc     dslf_fa13         omega        fa_dun       p_aa_pp yhh_planarity           ref   rama_prepro
core.scoring.Energies: E(i)  24         -7.40         19.03          2.94          8.76          0.09         -0.11         -0.56          0.00          0.00          0.00          0.00          0.00          0.00          0.12          2.68          0.06          0.00          2.30          3.58
None
In [10]:
# your response here

The van der Waals, solvation, and electrostatic terms are atom-atom pairwise energies calculated from a pre-tabulated lookup table, dependent upon the distance between the two atoms and their types. You can access this lookup table, called the etable directly to check these energy calculations on an atom-by-atom basis. Use the etable_atom_pair_energies function which returns a triplet of energies for attractive, repulsive and solvation scores.

(Note that the etable_atom_pair_energies() function requires Atom objects, not the AtomID objects we saw in Workshop #2. For more info, look at the documentation.)

Practice: What are the attractive, repulsive, solvation, and electrostatic components between the nitrogen of residue 24 and the oxygen of residue 20?

In [11]:
### BEGIN SOLUTION
res24 = ras.residue(24)
res20 = ras.residue(20)
res24_atomN = res24.atom_index("N")
res20_atomO = res20.atom_index("O")
pyrosetta.etable_atom_pair_energies(res24, res24_atomN, res20, res20_atomO, sfxn)
### END SOLUTION
Out[11]:
(-0.1505855046001568, 0.0, 0.5903452111877215, 2.173111777247698)