This notebook contains material from PyRosetta; content is available on Github.

RosettaAntibody Framework

Keywords: CDRResidueSelector

Overview

In this workshop we will learn how to use the RosettaAntibody framework. The full RosettaAntibody (modeling) code is not available in PyRosetta, unfortunately - as it is based around an application. To use that, you will have to use either the ROSIE server, or the Rosetta application.

For a full overview of the RosettaAntibody modeling application, see this paper: https://www.ncbi.nlm.nih.gov/pubmed/28125104

Snugdock, and H3 modeling component of RosettaAntibody are available here as movers.

In [ ]:
# Notebook setup
import sys
if 'google.colab' in sys.modules:
    !pip install pyrosettacolabsetup
    import pyrosettacolabsetup
    pyrosettacolabsetup.mount_pyrosetta_install()
    print ("Notebook is set for PyRosetta use in Colab.  Have fun!")

Make sure you are in the directory with the pdb files:

cd google_drive/My\ Drive/student-notebooks/

Imports

Lets import the antibody namespace so we can start using it. Take a look at the different modules that are a part of the antibody module.

Note that we can also do from rosetta.protocols.antibody import * in order to make accessing the enums much easier. For the purpose of this workshop, we will use antibody to traverse the contents. This makes it easier for you to use tab completion for exploration.

In [12]:
#Python
from pyrosetta import *
from pyrosetta.rosetta import *
from pyrosetta.teaching import *

#Core Includes
from rosetta.core.select import residue_selector as selections

from rosetta.protocols import antibody

Intitlialization

Here, we will initialize a typical run of Rosetta. We could use the -input_ab_scheme option with AHo_Scheme, but we will learn to instead pass this to our main antibody framework code.

In [13]:
init('-use_input_sc -ignore_unrecognized_res \
     -ignore_zero_occupancy false -load_PDB_components false -no_fconfig')
PyRosetta-4 2019 [Rosetta PyRosetta4.Release.python36.mac 2019.33+release.1e60c63beb532fd475f0f704d68d462b8af2a977 2019-08-09T15:19:57] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: Rosetta version: PyRosetta4.Release.python36.mac r230 2019.33+release.1e60c63beb5 1e60c63beb532fd475f0f704d68d462b8af2a977 http://www.pyrosetta.org 2019-08-09T15:19:57
core.init: command: PyRosetta -use_input_sc -ignore_unrecognized_res -ignore_zero_occupancy false -load_PDB_components false -no_fconfig -database /Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyrosetta-2019.33+release.1e60c63beb5-py3.6-macosx-10.6-intel.egg/pyrosetta/database
basic.random.init_random_generator: 'RNG device' seed mode, using '/dev/urandom', seed=967592561 seed_offset=0 real_seed=967592561
basic.random.init_random_generator: RandomGenerator:init: Normal mode, seed=967592561 RG_type=mt19937

Import and copy pose

Let's load an antibody - this this the same antibody we used to learn packing and design. :)

In [14]:
#Import a pose
pose = pose_from_pdb("inputs/2r0l_1_1.pdb")
original_pose = pose.clone()
core.import_pose.import_pose: File 'inputs/2r0l_1_1.pdb' automatically determined to be of type PDB
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OXT on residue ARG:CtermProteinFull 108
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OXT on residue SER:CtermProteinFull 225
core.conformation.Conformation: [ WARNING ] missing heavyatom:  OXT on residue ARG:CtermProteinFull 464
core.conformation.Conformation: Found disulfide between residues 23 88
core.conformation.Conformation: current variant for 23 CYS
core.conformation.Conformation: current variant for 88 CYS
core.conformation.Conformation: current variant for 23 CYD
core.conformation.Conformation: current variant for 88 CYD
core.conformation.Conformation: Found disulfide between residues 130 204
core.conformation.Conformation: current variant for 130 CYS
core.conformation.Conformation: current variant for 204 CYS
core.conformation.Conformation: current variant for 130 CYD
core.conformation.Conformation: current variant for 204 CYD
core.conformation.Conformation: Found disulfide between residues 250 266
core.conformation.Conformation: current variant for 250 CYS
core.conformation.Conformation: current variant for 266 CYS
core.conformation.Conformation: current variant for 250 CYD
core.conformation.Conformation: current variant for 266 CYD
core.conformation.Conformation: Found disulfide between residues 258 328
core.conformation.Conformation: current variant for 258 CYS
core.conformation.Conformation: current variant for 328 CYS
core.conformation.Conformation: current variant for 258 CYD
core.conformation.Conformation: current variant for 328 CYD
core.conformation.Conformation: Found disulfide between residues 353 422
core.conformation.Conformation: current variant for 353 CYS
core.conformation.Conformation: current variant for 422 CYS
core.conformation.Conformation: current variant for 353 CYD
core.conformation.Conformation: current variant for 422 CYD
core.conformation.Conformation: Found disulfide between residues 385 401
core.conformation.Conformation: current variant for 385 CYS
core.conformation.Conformation: current variant for 401 CYS
core.conformation.Conformation: current variant for 385 CYD
core.conformation.Conformation: current variant for 401 CYD
core.conformation.Conformation: Found disulfide between residues 412 440
core.conformation.Conformation: current variant for 412 CYS
core.conformation.Conformation: current variant for 440 CYS
core.conformation.Conformation: current variant for 412 CYD
core.conformation.Conformation: current variant for 440 CYD

AntibodyInfo

The main tool that we will use is the AntibodyInfo object. This allows you to get a TON of information about the antibody to use in various custom protocols.

Note that this antibody has already been renumbered using the PyIgClassify server.

Since we are not defining the numbering scheme and cdr definition during init, we will need to pass an Enum to the AntibodyInfo object.

In [15]:
ab_info = antibody.AntibodyInfo(pose, antibody.AHO_Scheme, antibody.North)
basic.io.database: Database file opened: sampling/antibodies/cluster_center_dihedrals.txt
protocols.antibody.AntibodyNumberingParser: Antibody numbering scheme definitions read successfully
protocols.antibody.AntibodyNumberingParser: Antibody CDR definition read successfully
antibody.AntibodyInfo: Successfully finished the CDR definition
antibody.AntibodyInfo: AC Detecting Regular CDR H3 Stem Type
antibody.AntibodyInfo: ARFWWRSFDYW
antibody.AntibodyInfo: AC Finished Detecting Regular CDR H3 Stem Type: KINKED
antibody.AntibodyInfo: AC Finished Detecting Regular CDR H3 Stem Type: Kink: 1 Extended: 0
antibody.AntibodyInfo: Setting up CDR Cluster for H1
protocols.antibody.cluster.CDRClusterMatcher: Length: 13 Omega: TTTTTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for H2
protocols.antibody.cluster.CDRClusterMatcher: Length: 10 Omega: TTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for H3
protocols.antibody.cluster.CDRClusterMatcher: Length: 10 Omega: TTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for L1
protocols.antibody.cluster.CDRClusterMatcher: Length: 11 Omega: TTTTTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for L2
protocols.antibody.cluster.CDRClusterMatcher: Length: 8 Omega: TTTTTTTT
antibody.AntibodyInfo: Setting up CDR Cluster for L3
protocols.antibody.cluster.CDRClusterMatcher: Length: 9 Omega: TTTTTTCTT

Lets take a look at what AntibodyInfo prints

In [16]:
print(ab_info)
////////////////////////////////////////////////////////////////////////////////
///                          Rosetta Antibody Info                           ///
///                                                                          ///
///             Antibody Type:  Regular Antibody
///             Light Chain Type:  unknown
/// Predict H3 Cterminus Base:  KINKED
///                                                                          
/// H1 info: 
///            length:  13
///          sequence:  AASGFTISNSGIH
///     north_cluster:  H1-13-1
///         loop_info:  LOOP start: 131  stop: 143  cut: 137  size: 13  skip rate: 0  extended?: False

/// H2 info: 
///            length:  10
///          sequence:  WIYPTGGATD
///     north_cluster:  H2-10-1
///         loop_info:  LOOP start: 158  stop: 167  cut: 163  size: 10  skip rate: 0  extended?: False

/// H3 info: 
///            length:  10
///          sequence:  ARFWWRSFDY
///     north_cluster:  H3-10-1
///         loop_info:  LOOP start: 205  stop: 214  cut: 206  size: 10  skip rate: 0  extended?: False

/// L1 info: 
///            length:  11
///          sequence:  RASQDVSTAVA
///     north_cluster:  L1-11-1
///         loop_info:  LOOP start: 24  stop: 34  cut: 29  size: 11  skip rate: 0  extended?: False

/// L2 info: 
///            length:  8
///          sequence:  YSASFLYS
///     north_cluster:  L2-8-1
///         loop_info:  LOOP start: 49  stop: 56  cut: 53  size: 8  skip rate: 0  extended?: False

/// L3 info: 
///            length:  9
///          sequence:  QQSYTTPPT
///     north_cluster:  L3-9-cis7-1
///         loop_info:  LOOP start: 89  stop: 97  cut: 93  size: 9  skip rate: 0  extended?: False

////////////////////////////////////////////////////////////////////////////////

Isn't that AWESOME!! I think so. But I wrote a lot of that code!

Anyway, as you can see you can get a pretty fair bit of information out of the AntibodyInfo object. In fact, most antibody-related code actually takes an AntibodyInfo object or constructs one from set numbering scheme, cdr definitions, and pose passed to it. You will see this as we go.

Note the north_cluster here. This is useful in some modeling tasks, but becomes much more relevant during antibody design. More information on what we mean by north_cluster can be found in this paper, if you want to read ahead a bit. https://www.ncbi.nlm.nih.gov/pubmed/21035459

Basic AntibodyInfo Access

Now, lets use the AntibodyInfo class to get a bit of useful information out of our antibody.

In [17]:
print("h1", ab_info.get_CDR_start(antibody.h1, pose))
print("h2", ab_info.get_CDR_end(antibody.h2, pose))
h1 131
h2 167

Now lets use these enums a bit more. They go in order from 1 to 8, with 7 and 8 being CDR4 loops - also known as H3 loops. We won't worry about them just yet.

In [18]:
for i in range(1, 7):
    print(i, ab_info.get_CDR_name(antibody.CDRNameEnum(i)))
    
for cdr in ['L1', 'l1', 'L2', 'l2', 'L3', 'H1', 'H2', 'H3']:
    print(cdr, str(ab_info.get_CDR_name_enum(cdr)))
          
print(str(antibody.h3))
print(int(antibody.h3))
1 H1
2 H2
3 H3
4 L1
5 L2
6 L3
L1 CDRNameEnum.l1
l1 CDRNameEnum.l1
L2 CDRNameEnum.l2
l2 CDRNameEnum.l2
L3 CDRNameEnum.l3
H1 CDRNameEnum.h1
H2 CDRNameEnum.h2
H3 CDRNameEnum.h3
CDRNameEnum.h3
3

Does this make enums a bit less confusing? These are named integers. The last function allows us to print either the actual cdr name enum or the integer from it. The cool thing here is that we can loop through all of the CDRs just by using a range 1-6 and rosetta will understand it.

Note that we convert the integer into a CDRNameEnum in the function. If we are storing the cdr name enums as indexes to a dictionary or list, we don't need this. That is simply for the C++ code to work properly.

AntibodyEnumManager

So we have seen that some of this code we can do directly within AntibodyInfo itself. Cool. But what if we need something more advanced? Lets use the class that actually does all this conversion.

In [19]:
enum_manager = antibody.AntibodyEnumManager()
print(enum_manager.numbering_scheme_enum_to_string(antibody.AHO_Scheme))
print(enum_manager.cdr_definition_enum_to_string(antibody.North))
print(enum_manager.cdr_name_string_to_enum("H1"))
print(enum_manager.antibody_region_enum_to_string(antibody.framework_region))
AHO_Scheme
North
CDRNameEnum.h1
framework_region

Use the function, get_region_or_residue and get_CDRNameEnum_of_residue and the manager to traverse the antibody and get relevant regions of all residues in the pose

In [20]:
### BEGIN SOLUTION

for i in range(1, pose.size()+1):
    region = ab_info.get_region_of_residue(pose, i)
    if (region == antibody.cdr_region):
        print(i, enum_manager.cdr_name_enum_to_string(ab_info.get_CDRNameEnum_of_residue(pose, i)))
    else:
        print(i, enum_manager.antibody_region_enum_to_string(region))
              
### END SOLUTION
1 framework_region
2 framework_region
3 framework_region
4 framework_region
5 framework_region
6 framework_region
7 framework_region
8 framework_region
9 framework_region
10 framework_region
11 framework_region
12 framework_region
13 framework_region
14 framework_region
15 framework_region
16 framework_region
17 framework_region
18 framework_region
19 framework_region
20 framework_region
21 framework_region
22 framework_region
23 framework_region
24 L1
25 L1
26 L1
27 L1
28 L1
29 L1
30 L1
31 L1
32 L1
33 L1
34 L1
35 framework_region
36 framework_region
37 framework_region
38 framework_region
39 framework_region
40 framework_region
41 framework_region
42 framework_region
43 framework_region
44 framework_region
45 framework_region
46 framework_region
47 framework_region
48 framework_region
49 L2
50 L2
51 L2
52 L2
53 L2
54 L2
55 L2
56 L2
57 framework_region
58 framework_region
59 framework_region
60 framework_region
61 framework_region
62 framework_region
63 framework_region
64 framework_region
65 framework_region
66 framework_region
67 framework_region
68 framework_region
69 framework_region
70 framework_region
71 framework_region
72 framework_region
73 framework_region
74 framework_region
75 framework_region
76 framework_region
77 framework_region
78 framework_region
79 framework_region
80 framework_region
81 framework_region
82 framework_region
83 framework_region
84 framework_region
85 framework_region
86 framework_region
87 framework_region
88 framework_region
89 L3
90 L3
91 L3
92 L3
93 L3
94 L3
95 L3
96 L3
97 L3
98 framework_region
99 framework_region
100 framework_region
101 framework_region
102 framework_region
103 framework_region
104 framework_region
105 framework_region
106 framework_region
107 framework_region
108 framework_region
109 framework_region
110 framework_region
111 framework_region
112 framework_region
113 framework_region
114 framework_region
115 framework_region
116 framework_region
117 framework_region
118 framework_region
119 framework_region
120 framework_region
121 framework_region
122 framework_region
123 framework_region
124 framework_region
125 framework_region
126 framework_region
127 framework_region
128 framework_region
129 framework_region
130 framework_region
131 H1
132 H1
133 H1
134 H1
135 H1
136 H1
137 H1
138 H1
139 H1
140 H1
141 H1
142 H1
143 H1
144 framework_region
145 framework_region
146 framework_region
147 framework_region
148 framework_region
149 framework_region
150 framework_region
151 framework_region
152 framework_region
153 framework_region
154 framework_region
155 framework_region
156 framework_region
157 framework_region
158 H2
159 H2
160 H2
161 H2
162 H2
163 H2
164 H2
165 H2
166 H2
167 H2
168 framework_region
169 framework_region
170 framework_region
171 framework_region
172 framework_region
173 framework_region
174 framework_region
175 framework_region
176 framework_region
177 framework_region
178 framework_region
179 framework_region
180 framework_region
181 framework_region
182 framework_region
183 framework_region
184 framework_region
185 framework_region
186 framework_region
187 framework_region
188 framework_region
189 framework_region
190 framework_region
191 framework_region
192 framework_region
193 framework_region
194 framework_region
195 framework_region
196 framework_region
197 framework_region
198 framework_region
199 framework_region
200 framework_region
201 framework_region
202 framework_region
203 framework_region
204 framework_region
205 H3
206 H3
207 H3
208 H3
209 H3
210 H3
211 H3
212 H3
213 H3
214 H3
215 framework_region
216 framework_region
217 framework_region
218 framework_region
219 framework_region
220 framework_region
221 framework_region
222 framework_region
223 framework_region
224 framework_region
225 framework_region
226 antigen_region
227 antigen_region
228 antigen_region
229 antigen_region
230 antigen_region
231 antigen_region
232 antigen_region
233 antigen_region
234 antigen_region
235 antigen_region
236 antigen_region
237 antigen_region
238 antigen_region
239 antigen_region
240 antigen_region
241 antigen_region
242 antigen_region
243 antigen_region
244 antigen_region
245 antigen_region
246 antigen_region
247 antigen_region
248 antigen_region
249 antigen_region
250 antigen_region
251 antigen_region
252 antigen_region
253 antigen_region
254 antigen_region
255 antigen_region
256 antigen_region
257 antigen_region
258 antigen_region
259 antigen_region
260 antigen_region
261 antigen_region
262 antigen_region
263 antigen_region
264 antigen_region
265 antigen_region
266 antigen_region
267 antigen_region
268 antigen_region
269 antigen_region
270 antigen_region
271 antigen_region
272 antigen_region
273 antigen_region
274 antigen_region
275 antigen_region
276 antigen_region
277 antigen_region
278 antigen_region
279 antigen_region
280 antigen_region
281 antigen_region
282 antigen_region
283 antigen_region
284 antigen_region
285 antigen_region
286 antigen_region
287 antigen_region
288 antigen_region
289 antigen_region
290 antigen_region
291 antigen_region
292 antigen_region
293 antigen_region
294 antigen_region
295 antigen_region
296 antigen_region
297 antigen_region
298 antigen_region
299 antigen_region
300 antigen_region
301 antigen_region
302 antigen_region
303 antigen_region
304 antigen_region
305 antigen_region
306 antigen_region
307 antigen_region
308 antigen_region
309 antigen_region
310 antigen_region
311 antigen_region
312 antigen_region
313 antigen_region
314 antigen_region
315 antigen_region
316 antigen_region
317 antigen_region
318 antigen_region
319 antigen_region
320 antigen_region
321 antigen_region
322 antigen_region
323 antigen_region
324 antigen_region
325 antigen_region
326 antigen_region
327 antigen_region
328 antigen_region
329 antigen_region
330 antigen_region
331 antigen_region
332 antigen_region
333 antigen_region
334 antigen_region
335 antigen_region
336 antigen_region
337 antigen_region
338 antigen_region
339 antigen_region
340 antigen_region
341 antigen_region
342 antigen_region
343 antigen_region
344 antigen_region
345 antigen_region
346 antigen_region
347 antigen_region
348 antigen_region
349 antigen_region
350 antigen_region
351 antigen_region
352 antigen_region
353 antigen_region
354 antigen_region
355 antigen_region
356 antigen_region
357 antigen_region
358 antigen_region
359 antigen_region
360 antigen_region
361 antigen_region
362 antigen_region
363 antigen_region
364 antigen_region
365 antigen_region
366 antigen_region
367 antigen_region
368 antigen_region
369 antigen_region
370 antigen_region
371 antigen_region
372 antigen_region
373 antigen_region
374 antigen_region
375 antigen_region
376 antigen_region
377 antigen_region
378 antigen_region
379 antigen_region
380 antigen_region
381 antigen_region
382 antigen_region
383 antigen_region
384 antigen_region
385 antigen_region
386 antigen_region
387 antigen_region
388 antigen_region
389 antigen_region
390 antigen_region
391 antigen_region
392 antigen_region
393 antigen_region
394 antigen_region
395 antigen_region
396 antigen_region
397 antigen_region
398 antigen_region
399 antigen_region
400 antigen_region
401 antigen_region
402 antigen_region
403 antigen_region
404 antigen_region
405 antigen_region
406 antigen_region
407 antigen_region
408 antigen_region
409 antigen_region
410 antigen_region
411 antigen_region
412 antigen_region
413 antigen_region
414 antigen_region
415 antigen_region
416 antigen_region
417 antigen_region
418 antigen_region
419 antigen_region
420 antigen_region
421 antigen_region
422 antigen_region
423 antigen_region
424 antigen_region
425 antigen_region
426 antigen_region
427 antigen_region
428 antigen_region
429 antigen_region
430 antigen_region
431 antigen_region
432 antigen_region
433 antigen_region
434 antigen_region
435 antigen_region
436 antigen_region
437 antigen_region
438 antigen_region
439 antigen_region
440 antigen_region
441 antigen_region
442 antigen_region
443 antigen_region
444 antigen_region
445 antigen_region
446 antigen_region
447 antigen_region
448 antigen_region
449 antigen_region
450 antigen_region
451 antigen_region
452 antigen_region
453 antigen_region
454 antigen_region
455 antigen_region
456 antigen_region
457 antigen_region
458 antigen_region
459 antigen_region
460 antigen_region
461 antigen_region
462 antigen_region
463 antigen_region
464 antigen_region

CDR Clusters

Use either the PyRosetta docs on AntibodyInfo, or the interactive notebook to use AntibodyInfo to get the length and cluster of L1.

In [21]:
### BEGIN SOLUTION

print(ab_info.get_CDR_length(antibody.l1))
print(ab_info.get_CDR_cluster(antibody.l1).cluster())

### END SOLUTION
11
CDRClusterEnum.L1_11_1

The CDRCluster object has a lot of information about a particular cluster. Lets use it to get the normalized distance in degrees of the L1 cluster.

In [22]:
L1_cluster = ab_info.get_CDR_cluster(antibody.l1)
print(L1_cluster.normalized_distance_in_degrees())
7.137242784087944

Anything below 35 or 40 degrees is very close to the cluster center. This is a structure with a very well-defined L1-11-1 loop - one of the most common L1 lengths and clusters.

Numbering Scheme Translation

It may not seem like much, but numbering scheme translation is a very difficult thing to do without mistakes. Rosetta now has this ability to make it much easier to understand antibody structural or sequence papers in a highly tested and fairly easy-to-use implementation. Lets take a look. We'll use AntibodyInfo and the get_landmark_resnum() function for this, but you could also use function get_antibody_numbering_info() that will give you all the conversions - though it is certainly a bit more tricky to use.

Conserved Inter-Domain Cysteine

The conserved cysteine residue forming the intradomain disulfide bridge always carries the label "23" as in the IMGT numbering scheme, while according to Kabat, it was labeled L23 in Vk and Vl, H22 in VH. Let's find this residue in our antibody. https://www.bioc.uzh.ch/plueckthun/antibody/Numbering/FR1/index.html

In [27]:
rosetta_num = ab_info.get_landmark_resnum(pose, antibody.Kabat_Scheme, 'H', 22)

What is the chain and resnum in OUR Aho numbering scheme? Is this a cysteine? How about a disulfide?

In [28]:
### BEGIN SOLUTION
print(pose.pdb_info().pose2pdb(rosetta_num))
print(pose.residue(rosetta_num))
### END SOLUTION
23 H 
Residue 130: CYS:disulfide (CYS, C):
Base: CYS
 Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS METALBINDING DISULFIDE_BONDED ALPHA_AA L_AA
 Variant types: DISULFIDE
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   SG  1HB  2HB 
Atom Coordinates:
   N  : -13.918, -0.011, 40.022
   CA : -15.022, -0.943, 39.837
   C  : -16.073, -0.624, 40.895
   O  : -15.877, -0.945, 42.066
   CB : -14.515, -2.379, 40.021
   SG : -15.8, -3.608, 40.319
   H  : -13.6187, 0.217354, 40.9592
   HA : -15.4065, -0.826975, 38.8236
  1HB : -13.9648, -2.68746, 39.1317
  2HB : -13.8236, -2.41565, 40.8626
Mirrored relative to coordinates in ResidueType: FALSE

Ok. Cool. Lets do the same thing for the Cysteine that is connected to this residue. In IMGT this is residue 104 on the heavy chain. Lets do the same thing here. Use tab completion for antibody.IMGT_Scheme for the enum. https://www.bioc.uzh.ch/plueckthun/antibody/Numbering/FR3a/index.html

In [29]:
### BEGIN SOLUTION
pre_cdr3_c = ab_info.get_landmark_resnum(pose, antibody.IMGT_Scheme, 'H', 104)
### END SOLUTION

Once again, what is the residue in our AHO-numbered antibody? Is it a Cysteine? Is it disulfide bonded?

In [31]:
### BEGIN SOLUTION
print(pose.pdb_info().pose2pdb(pre_cdr3_c))
print(pose.residue(pre_cdr3_c))

### END SOLUTION
106 H 
Residue 204: CYS:disulfide (CYS, C):
Base: CYS
 Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS METALBINDING DISULFIDE_BONDED ALPHA_AA L_AA
 Variant types: DISULFIDE
 Main-chain atoms:  N    CA   C  
 Backbone atoms:    N    CA   C    O    H    HA 
 Side-chain atoms:  CB   SG  1HB  2HB 
Atom Coordinates:
   N  : -14.312, -6.402, 36.316
   CA : -14.452, -6.929, 37.646
   C  : -15.678, -7.837, 37.662
   O  : -16.599, -7.672, 36.856
   CB : -14.501, -5.824, 38.705
   SG : -15.935, -4.767, 38.638
   H  : -14.9281, -5.66132, 36.0129
   HA : -13.5885, -7.5585, 37.8613
  1HB : -14.4721, -6.27099, 39.699
  2HB : -13.6222, -5.18697, 38.607
Mirrored relative to coordinates in ResidueType: FALSE

Sequence

Lets expore the sequence of this antibody

In [35]:
ab_seq = ab_info.get_antibody_sequence()
print(ab_seq)

L1_seq = ab_info.get_CDR_sequence_with_stem(antibody.l1, pose)
print("L1", L1_seq)

for i in range(1, 7):
    cdr = antibody.CDRNameEnum(i)
    print(cdr, ab_info.get_CDR_sequence_with_stem(cdr, pose))
DIQMTQSPSSLSASVGDRVTITCRASQDVSTAVAWYQQKPGKAPKLLIYSASFLYSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSYTTPPTFGQGTKVEIKREVQLVESGGGLVQPGGSLRLSCAASGFTISNSGIHWVRQAPGKGLEWVGWIYPTGGATDYADSVKGRFTISADTSKNTAYLQMNSLRAEDTAVYYCARFWWRSFDYWGQGTLVTVSS
L1 RASQDVSTAVA
CDRNameEnum.h1 AASGFTISNSGIH
CDRNameEnum.h2 WIYPTGGATD
CDRNameEnum.h3 ARFWWRSFDY
CDRNameEnum.l1 RASQDVSTAVA
CDRNameEnum.l2 YSASFLYS
CDRNameEnum.l3 QQSYTTPPT

Other AntibodyInfo functions

Use tab completion to find other useful functions. This includes movemap, loops, and fold-tree creation for specific tasks. With ResidueSelectors, this functionality is not quite as useful, but you should know that it is here.

AntibodyInfo Deprecated Functions

All functions are fair-game, except these: get_TaskFactory_AllCDRs and get_TaskFactory_OneCDR - This will be removed from AntibodyInfo as it is extremely specific to a particular antibody modeling task.

Antibody Util and SimpleMetrics

Util functions in Rosetta are stored in the util.hh file in each directory that has one. Within PyRosetta, when you import the namespace, these come with. There are many that you should be aware of to make modeling and design tasks easier for custom protocols.

We will go through some examples here.

Function: get_cdr_loops()

The get_cdr_loops function takes a vector1 bool of CDRs. Use the Enums to set H3 and L3 to true. Vector1 bool starts as all negative.

In [39]:
h3_l3 = rosetta.utility.vector1_bool(6)
print(h3_l3)

h3_l3[antibody.h3] = True
h3_l3[antibody.l3] = True

#Here, we get cdr loops, and set the stem size to 2, 
# so we include 2 residues on either side of the CDR loop (called the stem), to help us in modeling.
h3_l3_loops = antibody.get_cdr_loops(ab_info, pose, h3_l3, 2)
print(h3_l3_loops)
vector1_bool[0, 0, 0, 0, 0, 0]
LOOP  begin  end  cut  skip_rate  extended
LOOP start: 203  stop: 216  cut: 210  size: 14  skip rate: 0  extended?: False

LOOP start: 87  stop: 99  cut: 93  size: 13  skip rate: 0  extended?: False


Function: select_epitope_residues()

We could use the NeighborhoodResidueSelector as you have used in the passed to get neighbors. Instead, lets use a general function to get all the epitope residues within an 8 Angstrum distance of the paratope.

In [46]:
epi_residues = antibody.select_epitope_residues(ab_info, pose, 8)
total=0
for i in range(1, len(epi_residues)+1):
    if epi_residues[i]:
        print(i)
        total+=1
print("Total Epitope Residues:", total)
267
270
271
272
273
299
300
301
302
303
304
305
307
308
309
310
313
396
397
398
454
458
Total Epitope Residues: 22

So that was cool. But lets the wonderful ReturnResidueSubsetSelector to take this ResidueSubset of the epitope residues and store the data as a ResidueSelector!

In [47]:
epi_res_selector = selections.ReturnResidueSubsetSelector(epi_residues)

Now what? Lets use some SimpleMetrics using the selector to calculate something about these epitope residues.

SasaMetric, TotalEnergyMetric, SelectedResiduesPyMOLMetric

In [54]:
import rosetta.core.simple_metrics.metrics as sm
sasa_metric = sm.SasaMetric(epi_res_selector)
print("\nSASA", sasa_metric.calculate(pose))

total_metric = sm.TotalEnergyMetric(epi_res_selector)
print("\nTOTAL RESIDUE ENERGY", total_metric.calculate(pose))

#Lets use a useful metric to select these residues in pymol
pymol_metric = sm.SelectedResiduesPyMOLMetric(epi_res_selector)
print("\nSELECTION", pymol_metric.calculate(pose))
SASA 531.9639835627297
core.scoring.ScoreFunctionFactory: SCOREFUNCTION: ref2015

TOTAL RESIDUE ENERGY -2.6964334237038683

SELECTION select rosetta_sele, (chain A and resid 42,45,46,47,48,74,75,76,77,78,79,80,82,83,84,85,88,171,172,173,229,233)

Now lets see which of these residues are most buried in the interface and the residues which have the lowest energy. Note that this is not ddG - we would need to separate the chains for this. We can use the protocols.toolbox.rigid_body.translate function to do that.

Use the pymol selection (copy from select...) and lets take a look at them in PyMol. Then run the code below.

PerResidueSasaMetric

In [73]:
import rosetta.core.simple_metrics.per_residue_metrics as residue_sm
import operator

res_sasa_metric = residue_sm.PerResidueSasaMetric()
res_sasa_metric.set_residue_selector(epi_res_selector)
per_res_sasa = res_sasa_metric.calculate(pose)
#print(per_res_sasa)

#Convert the Map to a Dictionary, which are essentially the same thing. 
for ele in sorted(per_res_sasa.items(), key=operator.itemgetter(1), reverse=False):
    print(ele)
(300, 0.0)
(303, 0.0)
(305, 0.0)
(304, 0.4468042885105504)
(267, 1.024686682355324)
(398, 1.8244508447514138)
(302, 4.098746729421277)
(396, 4.098746729421277)
(271, 5.380780270728442)
(307, 5.504698647620041)
(301, 7.689850871116937)
(313, 8.068377879289471)
(270, 8.322490061360945)
(458, 14.530261630816586)
(310, 17.873691916125896)
(308, 39.92505047193878)
(454, 46.22431607929343)
(397, 54.69632267990853)
(299, 60.31999848338232)
(309, 75.13475653929288)
(272, 76.34620896564937)
(273, 100.45374379174626)

Cool. So the most buried residues at the interface are 300, 303, 305. Convert those to the PDB chain/num using PDBInfo and take a look at them in PyMOL.

PerResidueEnergyMetric

In [76]:
res_energy_metric = residue_sm.PerResidueEnergyMetric()
res_energy_metric.set_residue_selector(epi_res_selector)

per_res_energy = res_sasa_metric.calculate(pose)
#print(per_res_sasa)

#Convert the Map to a Dictionary, which are essentially the same thing. 
for ele in sorted(per_res_energy.items(), key=operator.itemgetter(1), reverse=False):
    print(ele[0], pose.pdb_info().pose2pdb(ele[0]), ele[1])
300 75 A  0.0
303 78 A  0.0
305 80 A  0.0
304 79 A  0.4468042885105504
267 42 A  1.024686682355324
398 173 A  1.8244508447514138
302 77 A  4.098746729421277
396 171 A  4.098746729421277
271 46 A  5.380780270728442
307 82 A  5.504698647620041
301 76 A  7.689850871116937
313 88 A  8.068377879289471
270 45 A  8.322490061360945
458 233 A  14.530261630816586
310 85 A  17.873691916125896
308 83 A  39.92505047193878
454 229 A  46.22431607929343
397 172 A  54.69632267990853
299 74 A  60.31999848338232
309 84 A  75.13475653929288
272 47 A  76.34620896564937
273 48 A  100.45374379174626

Wow! Why is 48A so high in energy!? This may be due to the fact that we are working with a crystal structure that has not been pre-relaxed using the pareto-optimal protocol. Be sure when using PDBs from the data bank for production runs to do this, outputting about 10 models and selecting the lowest energy residue. Or, you could use density to relax within the crystal denstiy. Either works well.

https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0059004

Other Useful Antibody Tools

CDRResidueSelector

In [81]:
from rosetta.protocols.antibody.residue_selector import *

cdr_selector = CDRResidueSelector(ab_info)
cdr_selector.set_cdrs(h3_l3)
sele = cdr_selector.apply(pose)
for i in range(1, len(sele)):
    if sele[i]:
        print(i, pose.pdb_info().pose2pdb(i))
89 107 L 
90 108 L 
91 109 L 
92 110 L 
93 111 L 
94 135 L 
95 136 L 
96 137 L 
97 138 L 
205 107 H 
206 108 H 
207 109 H 
208 110 H 
209 111 H 
210 134 H 
211 135 H 
212 136 H 
213 137 H 
214 138 H 

AntibodyRegionSelector

We can use the AntibodyRegionSelector to select a specific region: antigen_region, framework_region, and cdr_region

In [83]:
region_selector = AntibodyRegionSelector(ab_info)
region_selector.set_region(antibody.antigen_region)
sele = region_selector.apply(pose)

for i in range(1, len(sele)):
    if sele[i]:
        print(i, pose.pdb_info().pose2pdb(i))
89 107 L 
90 108 L 
91 109 L 
92 110 L 
93 111 L 
94 135 L 
95 136 L 
96 137 L 
97 138 L 
205 107 H 
206 108 H 
207 109 H 
208 110 H 
209 111 H 
210 134 H 
211 135 H 
212 136 H 
213 137 H 
214 138 H 

Other

  • TaskOperations - Antibody specific Task Operations will be covered in the next workshop
  • SnugDock - Snugdock is available in the rosetta.protocols.antibody.snugdock namespace. Both the full protocol, SnugDockProtocol and the mover, Snugdock are available and easy to setup through code - but their run time is extremely long.
  • AntibodyModelerProtocol - this is the Antibody_H3 app. Personally, I would use the Rosetta C++ application for this with specific options specified in the docs, however you can call this in PyRosetta.
  • AntibodyCDRGrafter This is the main grafting class used for RosettaAntibody and RosettaAntibodyDesign. Is is in the protocols.antibody namespace. Documentation on this mover can be found here (XML or code-level interface is available): https://www.rosettacommons.org/docs/latest/scripting_documentation/RosettaScripts/Movers/movers_pages/antibodies/AntibodyCDRGrafter

References

Please site these papers when using any of RosettaAntibody.

  • J. Adolf-Bryfogle, O Kalyuzhniy, M Kubitz, B. D. Weitzner, X Hu, Y Adachi, W R. Schief, R L. Dunbrack Jr.,

    • "Rosetta Antibody Design (RAbD): A General Framework for Computational Antibody Design", PLOS Computational Biology (2018)
  • B. D. Weitzner, J. R. Jeliazkov, S. Lyskov*, N. M. Marze, D. Kuroda, R. Frick, J. Adolf-Bryfogle, N. Biswas, R. L. Dunbrack Jr., and J. J. Gray,

    • "Modeling and docking of antibody structures with Rosetta." Nature Protocols 12, 401–416 (2017)
  • B. D. Weitzner, D. Kuroda, N. M. Marze, J. Xu & J. J. Gray,

    • "Blind prediction performance of RosettaAntibody 3.0: Grafting, relaxation, kinematic loop modeling, and full CDR optimization." Proteins 82(8), 1611–1623 (2014)
  • A. Sivasubramanian, A. Sircar, S. Chaudhury & J. J. Gray,

    • "Toward high-resolution homology modeling of antibody Fv regions and application to antibody-antigen docking," Proteins 74(2), 497–514 (2009)
In [ ]: