This notebook contains material from PyRosetta; content is available on Github.

HBNet Before Design

Keywords: HBNet, OperateOnResidueSubset, getPoseExtraScore, InterGroupInterfaceByVectorSelector, ChainSelector, PreventRepackingRLT, RestrictToRepackingRLT, OperateOnResidueSubset, ResiduePDBInfoHasLabelSelector, PackRotamersMover

Sometimes in Rosetta we want to run implicit multistage design. That is, we want to optimize one conformation while implicitly modeling another (either negatively or positively). There are many ways to accomplish this depending on your interests. In this section we will look at HBNet, a tool for explicitly designing hydrogen bond networks.

One negative-design approach is to implicitly model binding specificity. Designing a complicated network of hydrogen bonds at one interface will implicitly destabilize other interfaces. Hydrogen bonds are so sensitive to geometry that competing interfaces are unlikely to be able to "satisfy" the network well enough to remain competetive.

The previous example can also be viewed through the implicit positive-design lens as well. We often find that Rosetta designs very hydrophobic interfaces (especially with newer score functions). Running HBNet before the traditional design protocols can boost the polar residue concentration of your interface in exchange for a small cost packing quality. In other words, we can implicitly stabilize the unbound state by running HBNet, but it might mildly destabilize the bound state.

Our experience shows that it is useful to run both with and without HBNet, depending on your design case. It is possible that the default design protocol handles your implicits states well enough. When that fails, though, there is not much to do to fix it other than to run pre-design protocols like HBNet. An added benefit of HBNet is that it can provide "seeds" for packing, which can influence design diversity if nothing else.

In [1]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

Make sure you are in the directory with the pdb files:

cd google_drive/My\ Drive/student-notebooks/

In [4]:
# From previous section:
from pyrosetta import *
from pyrosetta.teaching import *
pyrosetta.init("-mute core -mute basic")
print( "init complete" )
PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.31+release.9a323bc72ca18d3abdc8b1a730b37e52197e4ceb 2019-07-29T16:16:04] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
init complete

We prepare for HBNet the same way that we prepare for packing. We setup the pose and score function as before...

In [5]:
pose = pose_from_pdb("inputs/hbnet_example.pdb")
start_pose = Pose()
start_pose.assign(pose)
scorefxn = get_fa_scorefxn()
print( "setup complete" )
setup complete

Just like before, you can edit the resfile to your own personal specifications. Alternatively, you can use task operations to automate the process. Let's use task operations to fix all residues not at the interface.

Setting Designable Residues:

Create a new task for design

In [6]:
from pyrosetta.rosetta.core.select.residue_selector import InterGroupInterfaceByVectorSelector, ChainSelector, NotResidueSelector

chain1 = ChainSelector( "1" ) #selects the first chain
chain2 = ChainSelector( "2" ) #selects the second chain

interface_selector = InterGroupInterfaceByVectorSelector( chain1, chain2 );#selects residues at the interface
not_interface_selector = NotResidueSelector( interface_selector ); #selects residues not at the interface

from pyrosetta.rosetta.core.pack.task.operation import PreventRepackingRLT, RestrictToRepackingRLT, OperateOnResidueSubset

#prevent non interface residues from repacking/designing
fix_non_interface = OperateOnResidueSubset( PreventRepackingRLT(), not_interface_selector )

#perhaps we are performing one-sided design and do not want to make mutations on chain 2:
no_mutation_chain2 = OperateOnResidueSubset( RestrictToRepackingRLT(), chain2 )

from pyrosetta.rosetta.core.pack.task import TaskFactory
task_factory = TaskFactory()
task_factory.push_back( fix_non_interface )
task_factory.push_back( no_mutation_chain2 )

task_design = task_factory.create_task_and_apply_taskoperations( pose )
print( "Num residues: ", pose.size() )
print( "Num packable residues: ", task_design.num_to_be_packed() ) # this includes the ones being designed

num_designable = 0
for i in range( 1, pose.size() + 1 ):
    if( task_design.design_residue( i ) ):
        num_designable += 1;
print( "Num designable residues: ", num_designable )
Num residues:  454
Num packable residues:  116
Num designable residues:  53

Running HBNet

This is an interface case so we will use HBNetStapleInterface. We will use both the code-level interface, and the XML interface as an introduction to this functionality. The XML interface to PyRosetta will be covered more in later workshops.

In [7]:
from pyrosetta.rosetta.protocols.hbnet import HBNetStapleInterface

#This is similar to the clone operation, 
# but instead of copying the original pose, we copy the dtat of start pose INTO the pose

pose.assign(start_pose)

# There are two ways to setup a mover:
#   (1) by creating a blank instance of that mover and individually setting any non-default features
#   (2) by passing an XML string, similar to rosetta_scripts
# We will be using option 2 because it provides a more consistent interface to the movers
# Plus, the store_network_scores_in_pose feature is only available using option 2 for versions of PyRosetta older than September-ish 2019
setup_using_string = True #Option 2

if setup_using_string:
    hbnet = pyrosetta.rosetta.protocols.rosetta_scripts.XmlObjects.create_from_string("""
    <MOVERS>
    <HBNetStapleInterface name="hbnet" monte_carlo="true" store_network_scores_in_pose="true"/>
    </MOVERS>
    """).get_mover("hbnet")

    #monte_carlo="true" is highly recommended, especially for large systems like asymmetric interfaces
    #see PMID: 29652499
else:
    hbnet = HBNetStapleInterface()

    #This is highly recommended, especially for large systems like asymmetric interfaces
    #see PMID: 29652499
    hbnet.set_monte_carlo_branch( True )

    #set_monte_carlo_seed_must_be_buried does two things:
    #(1) speeds us up by decreasing the sample space
    #(2) ensures that our final hbond network will be at least partially buried
    #hbnet.set_monte_carlo_seed_must_be_buried( True )
    #commenting this out just to give us more results

#You can toggle the verbosity here if you are interested
#hbnet.verbose( True )

#We can normallly leave this as the default
#making it smaller now to let it run faster
hbnet.set_total_num_mc_runs( 1000 )
#Could have been done with
#<HBNetStapleInterface name="hbnet" monte_carlo="true" total_num_mc_runs="1000" store_network_scores_in_pose="true"/>

hbnet.task_factory( task_factory )
#alternatively:
#hbnet.set_task( task_design )

hbnet.set_score_function( scorefxn )
hbnet.apply(pose)

print( "Change in score", scorefxn(pose) - scorefxn(start_pose) )
protocols.rosetta_scripts.RosettaScriptsParser: Generating XML Schema for rosetta_scripts...
protocols.rosetta_scripts.RosettaScriptsParser: ...done
protocols.rosetta_scripts.RosettaScriptsParser: Initializing schema validator...
protocols.rosetta_scripts.RosettaScriptsParser: ...done
protocols.rosetta_scripts.RosettaScriptsParser: Validating input script...
protocols.rosetta_scripts.RosettaScriptsParser: ...done
protocols.rosetta_scripts.RosettaScriptsParser: Parsed script:
<ROSETTASCRIPTS>
	<MOVERS>
		<HBNetStapleInterface monte_carlo="true" name="hbnet" store_network_scores_in_pose="true"/>
	</MOVERS>
	<PROTOCOLS/>
</ROSETTASCRIPTS>
protocols.rosetta_scripts.RosettaScriptsParser: Defined mover named "hbnet" of type HBNetStapleInterface
protocols.rosetta_scripts.ParsedProtocol: ParsedProtocol mover with the following movers and filters
protocols.hbnet.HBNet:  Creating packer task based on specified task operations...
protocols.hbnet.HBNet:  Creating packer task based on specified task operations...
protocols.hbnet.HBNet:  built 9732 rotamers at 84 positions.
protocols.hbnet.HBNet:  IG: 1534084 bytes
protocols.hbnet.HBNet:  ==================================================================
protocols.hbnet.HBNet:  ============     PERFORMING H-BOND NETWORK DESIGN     ============
protocols.hbnet.HBNet:  ==================================================================
protocols.hbnet.HBNet: starting monte carlo branching protocol
protocols.hbnet.HBNet: 	10% done with branching
protocols.hbnet.HBNet: 	20% done with branching
protocols.hbnet.HBNet: 	30% done with branching
protocols.hbnet.HBNet: 	40% done with branching
protocols.hbnet.HBNet: 	50% done with branching
protocols.hbnet.HBNet: 	60% done with branching
protocols.hbnet.HBNet: 	70% done with branching
protocols.hbnet.HBNet: 	80% done with branching
protocols.hbnet.HBNet: 	90% done with branching
protocols.hbnet.HBNet: 	100% done with branching
protocols.hbnet.HBNet: Monte carlo branching protocol found 196 networks.
protocols.hbnet.HBNet:  NUMBER OF H-BOND NETWORKS AFTER BRANCH: 134
protocols.hbnet.HBNet:  NUMBER OF NETWORKS AFTER REMOVE REPLICATES = 131
protocols.hbnet.HBNet:  NUMBER OF NETWORKS AFTER SELECTION = 10
protocols.hbnet.HBNet:  NUMBER OF NETWORKS AFTER REMOVE_REP = 10
protocols.hbnet.HBNet:  Designed these new networks that meet your criteria:
protocols.hbnet.HBNet: 	#HBNet_rank 	 residues 	 size 	 score 	 num_hbonds 	 percent_hbond_capacity 	 num_unsat_Hpol 	interf_hbs
protocols.hbnet.HBNet: 	#network_1	D_Y_155,D_R_180,D_E_185,D_R_187,E_R_207,backbone	5	-0.650077	8	0.714286	0	3
protocols.hbnet.HBNet: 	#network_2	D_Y_155,D_E_185,D_R_187,E_R_207,backbone	4	-1.07079	6	0.6875	0	3
protocols.hbnet.HBNet: 	#network_3	D_S_154,D_S_189,E_R_207	3	2.18313	4	0.666667	0	2
protocols.hbnet.HBNet: 	#network_4	D_Y_155,D_R_180,D_E_185,E_R_207	4	-0.93091	5	0.625	0	3
protocols.hbnet.HBNet: 	#network_5	D_Y_33,D_R_40,D_H_50,D_E_116,E_R_107	5	1.32341	6	0.611111	0	1
protocols.hbnet.HBNet: 	#network_6	D_Y_33,D_R_40,D_Q_44,D_H_50,D_E_116,E_R_107	6	0.893853	7	0.590909	0	1
protocols.hbnet.HBNet: 	#network_7	D_Y_33,D_R_40,D_Q_44,D_H_50,D_E_116,E_Q_44,E_R_107	7	0.71998	8	0.576923	0	2
protocols.hbnet.HBNet: 	#network_8	D_Y_33,D_R_40,D_E_116,E_R_107	4	1.33101	5	0.5625	0	1
protocols.hbnet.HBNet: 	#network_9	D_H_116,E_S_40,E_R_107	3	2.49845	3	0.555556	0	1
protocols.hbnet.HBNet: 	#network_10	D_Y_155,D_E_185,E_R_207	3	-0.259134	3	0.545455	0	3
protocols.evaluation.ChiWellRmsdEvaluatorCreator: Evaluation Creator active ...
protocols.hbnet.HBNet: 
Change in score 1209.2788032525623

Wait, my score is terrible.

Question: Why?

In [ ]:
 

Finishing Design:

Well of course the score is terrible, the pose is dense with clashes. We had 116 packable residues and only assigned states to 5 of them. The other 111 residues are still in their input conformations and likely clash with the 5 we just assigned.

We need to run the packer (either using PackRotamersMover or FastDesign) but we don't want to overwrite the residues we just assigned with HBNet. The trick here is to select the residues with "HBNet" labels and fix them.

In [8]:
from pyrosetta.rosetta.core.select.residue_selector import ResiduePDBInfoHasLabelSelector

#prevent hbnet residues from repacking/designing
hbnet_selector = ResiduePDBInfoHasLabelSelector( "HBNet" )
fix_hbnet = OperateOnResidueSubset( PreventRepackingRLT(), hbnet_selector )
task_factory.push_back( fix_hbnet ) #recycling the same factory as before, just adding a new operation
task_design2 = task_factory.create_task_and_apply_taskoperations( pose )

#sanity check
num_hbnet_residues = 0
for x in hbnet_selector.apply( pose ):
    if x:
        num_hbnet_residues += 1
print( "Num HBNet Residues", num_hbnet_residues )

#this is unrelated to the narrative but I highly recommend using the linear memory interaction graph whenever performing design. It's a huge speedup
#it does not seem to matter for the scope here, but it will when you start using extra chi sampling (-ex1, -ex2)
task_design2.or_linmem_ig( True )

from pyrosetta.rosetta.protocols.minimization_packing import PackRotamersMover
pack_mover = PackRotamersMover( scorefxn, task_design2 )
pack_mover.apply( pose )
print( "Change in score", scorefxn(pose) - scorefxn(start_pose) )
Num HBNet Residues 5
Change in score 158.01310006922307

We made it...?

The change in score is a better, but still positive. One great thing about HBNet is that it can return multiple poses. Each one is slightly worse than the previous by HBNet's standards but might design into something better. Let's try a few to see if they help.

In [6]:
#there were 10 (or so) networks total, but let's just try the next 5
#this might take a few minutes...
if not os.getenv('DEBUG'): #Adding this line to decrease runtime on the testing server
    for x in range(0,5):
        extra_pose = hbnet.get_additional_output()
        if extra_pose is None:
            break
        task_design3 = task_factory.create_task_and_apply_taskoperations( extra_pose )
        task_design3.or_linmem_ig( True )
        pack_mover = PackRotamersMover( scorefxn, task_design3 )
        pack_mover.apply( extra_pose )
        print( "Change in score", scorefxn(extra_pose) - scorefxn(start_pose) )
protocols.hbnet.HBNet: 
core.pack.pack_rotamers: built 12700 rotamers at 112 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating LinearMemoryInteractionGraph
Change in score 208.0715304180745
protocols.hbnet.HBNet: 
core.pack.pack_rotamers: built 12890 rotamers at 113 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating LinearMemoryInteractionGraph
Change in score -60.13706595618231
protocols.hbnet.HBNet: 
core.pack.pack_rotamers: built 12595 rotamers at 111 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating LinearMemoryInteractionGraph
Change in score 62.89829311472579
protocols.hbnet.HBNet: 
core.pack.pack_rotamers: built 11897 rotamers at 111 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating LinearMemoryInteractionGraph
Change in score -45.536073668849326
protocols.hbnet.HBNet: 
core.pack.pack_rotamers: built 11684 rotamers at 110 positions.
core.pack.interaction_graph.interaction_graph_factory: Instantiating LinearMemoryInteractionGraph
Change in score -31.404979911851626

But wait, there's more??

Great! We found a few results that designed to me more stable than the input pose (-60, -45, and -31 REU)!

The main score function is not the only way to evaluate these networks. HBNet also adds its own score terms. These are useful for sorting/filtering decoys before running expensive packing calculations.

In [11]:
from pyrosetta.rosetta.core.pose import hasPoseExtraScore, getPoseExtraScore

if hasPoseExtraScore( pose, "HBNet_NumUnsatHpol" ):
    #All 3 of these metrics are explained in more detail in Maguire, Boyken, et al. (see second reference below)

    #NumUnsatHpol is HBNet's primary sorting metric, it counts the number of polar hydrogen atoms that are unsatisfied (buried and not forming hbonds). We know that there are no heavy (non-hydrogen) unsatisfied atoms because HBNet filters those out by default. Lower is better
    print( "HBNet_NumUnsatHpol", getPoseExtraScore( pose, "HBNet_NumUnsatHpol" ) )

    #HBNet's secondary sorting metric. 1.0 if every polar atom in the network is forming the maximum number of hbonds. Higher is better
    print( "HBNet_Saturation", getPoseExtraScore( pose, "HBNet_Saturation" ) )

    #HBNet's tertiary sorting metric. A little complicated but lower is better.
    print( "HBNet_Score", getPoseExtraScore( pose, "HBNet_Score" ) )
else:
    print( "Somebody go bug a developer to enable this feature for PyRosetta" )
HBNet_NumUnsatHpol 0.0
HBNet_Saturation 0.7142857313156128
HBNet_Score -0.6500769257545471

Advice for using this in the wild

ex1 ex2

HBonds are very sensitive to sidechain sampling resolution. I highly recommend using -ex1 and -ex2. You can do this by adding:

ex1ex2 = ExtraRotamersGeneric()
ex1ex2.ex1( True )
ex1ex2.ex2( True )
task_factory.push_back( ex1ex2 )

get_additional_output

As we saw in the exercise, the first result out of HBNet does not always wind up being the best. Try designing with a few results from hbnet.get_additional_output() to get more coverage of the design space. For the commandline users reading this, this functionality can also be accessed via multistage_rosetta_scripts or the MultiplePoseMover in rosetta_scripts. See the rosetta_scripts_scripts repository for examples.

set_monte_carlo_seed_must_be_buried

I highly recommend playing with the set_monte_carlo_seed_must_be_buried mentioned above. Without it, HBNet tends to just design many surface networks that nobody really cares about.

Thought Question

The energy of HBNet+Design is often less favorable that the energy after an equivalent design run without HBNet. Why do people still use HBNet?

References

  • Boyken SE, Chen Z, Groves B, et al. De novo design of protein homo-oligomers with modular hydrogen-bond network-mediated specificity. Science. 2016;352(6286):680–687. doi:10.1126/science.aad8865
  • Maguire JB, Boyken SE, Baker D, Kuhlman B. Rapid Sampling of Hydrogen Bond Networks for Computational Protein Design. J Chem Theory Comput. 2018;14(5):2751–2760. doi:10.1021/acs.jctc.8b00033
In [ ]: