@Author: 张博文 @email:bowen.zhang@xtalpi.com
@Proofread: 吴炜坤 @email:weikun.wu@xtalpi.com
上一节中,我们介绍了Rosetta中坐标的处理方式Rosetta的基本概念,及其顺序性、杠杠性。这一节中,我们将继续介绍:
(1)FoldTree的另两个特性:跳跃性和可切割性。
(2)快速设置特殊任务foldtree的方法
# 初始化pyrosetta,使用pyrosetta必须要做的一件事情
from pyrosetta import init, pose_from_pdb, pose_from_sequence
init()
PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org (C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team. core.init: {0} Checking for fconfig files in pwd and ./rosetta/flags core.init: {0} Rosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54 core.init: {0} command: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database basic.random.init_random_generator: {0} 'RNG device' seed mode, using '/dev/urandom', seed=1987000656 seed_offset=0 real_seed=1987000656 thread_index=0 basic.random.init_random_generator: {0} RandomGenerator:init: Normal mode, seed=1987000656 RG_type=mt19937
上一节中我们已经提到了FoldTree的另外两个特性:跳跃性和可切割性,这一节我们将详细的介绍。
上一节我们已经提过,在FoldTree中当蛋白质有多条链组成时,尽管每一条链中的残基都是共价连接,多条链之间实际上并不存在实际的共价连接。而在FoldTree的概念中,一个树结构只能有一个“根”, 因此在Rosetta中,引入了一种虚拟共价链接,为成为Jump。如同跳跃一般,可以建立一种"超远程"的上下游关系。
以下我们将实践揭开jump的定义方式:
#首先读取data文件中的PDB:1V74文件
pose = pose_from_pdb('./data/1v74.pdb')
core.chemical.GlobalResidueTypeSet: {0} Finished initializing fa_standard residue type set. Created 984 residue types core.chemical.GlobalResidueTypeSet: {0} Total time to initialize 0.640777 seconds. core.import_pose.import_pose: {0} File './data/1v74.pdb' automatically determined to be of type PDB core.conformation.Conformation: {0} [ WARNING ] missing heavyatom: OXT on residue LEU:CtermProteinFull 107 core.conformation.Conformation: {0} [ WARNING ] missing heavyatom: OXT on residue LEU:CtermProteinFull 194
# 输出该pose的FoldTree
print("PDB:1V74: ",pose.fold_tree())
print(f"The size of the foldtree (number of edges) is {pose.fold_tree().size()}") #Foldtree 的尺寸(edge的个数)
print(f"The number of the jump is {pose.fold_tree().num_jump()}") #Foldtree 的jump点的个数
PDB:1V74: FOLD_TREE EDGE 1 107 -1 EDGE 1 108 1 EDGE 108 194 -1 The size of the foldtree (number of edges) is 3 The number of the jump is 1
可以看到该Pose中含有3个Edge(含一个Jump Edge)。与上一章介绍的一样,其中,从1-107、108-194最后一位为-1,因此这两个edge内所有残基为共价连接,而1和108号残基,即该蛋白的A链和B链的N端由一个Jump连接,即一个虚拟的“共价键”。
将这个FoldTree进行可视化的话,即如上图所示。此时A链的1号残基为B链108-194残基的上游,同时也是2-107号残基的上游。 因此jump1的虚拟链接定义了ChainA和ChainB之间的上下游关系。
此时,FoldTree树结构才只会有一个根部起点!!Root=PoseID:1
思考:
设置带Jump点的foldtree方式也是通过FoldTree下的add_edge函数完成,用以下的例子,我们尝试复现1.1节中展示的foldtree。
先来看一个例子:
from pyrosetta.rosetta.core.kinematics import FoldTree
# 完整定义试试:
ft = FoldTree() #设置一个空的foldtree对象
ft.add_edge(start=1, stop=107, label=-1)
ft.add_edge(start=108, stop=194, label=-1)
ft.add_edge(start=1, stop=108, label=1) # add jump
ft.check_fold_tree() # 检查foldtree的有效性;
print(ft)
core.kinematics.FoldTree: {0} [ ERROR ] Bad fold tree at edge EDGE 108 194 -1 core.kinematics.FoldTree: {0} [ ERROR ] Start residue 108 not built yet. core.kinematics.FoldTree: {0} [ ERROR ] FOLD_TREE EDGE 1 107 -1 EDGE 108 194 -1 EDGE 1 108 1 FOLD_TREE EDGE 1 107 -1 EDGE 108 194 -1 EDGE 1 108 1
笔者解读: 这是因为在构建foldtree时是有顺序性的。每一个子节点必须先定义父节点才能生效。在第一种做法中,108号残基在创建时没有找到他可以连接的父节点。而在第二种做法中,先定义了107-108之间的jump,因此108号残基的foldtree才是有效的!
我们先定义里两个edge之后再去定义jump。但发现foldtree是无效的!
from pyrosetta.rosetta.core.kinematics import FoldTree
# 完整定义试试:
ft = FoldTree() #设置一个空的foldtree对象
ft.add_edge(start=1, stop=107, label=-1)
ft.add_edge(start=1, stop=108, label=1) # add jump
ft.add_edge(start=108, stop=194, label=-1)
ft.check_fold_tree() # 检查foldtree的有效性;
True
而如果先构建第一个edge,然后构建jump,在构建第二个edge时,foldtree是有效的。这种现象产生原因是?
jump点的设置练习: 读者可根据以下foldtree的逻辑图创建一个foldtree么?(提示注意jump编号)
# 答案
from pyrosetta.rosetta.core.kinematics import FoldTree
# 完整定义试试:
ft = FoldTree() #设置一个空的foldtree对象
# 正确答案:
ft.add_edge(start=1, stop=107, label=-1)
ft.add_edge(start=107, stop=108, label=1) # add jump1
ft.add_edge(start=108, stop=157, label=-1)
ft.add_edge(start=88, stop=158, label=2) # add jump2
ft.add_edge(start=158, stop=194, label=-1)
ft.check_fold_tree() # 检查foldtree的有效性;
True
关键点在于按照jump的流向构建FoldTree
FoldTree是可以在edge中进行对点的切割,即设置cutpoint,在快速设置loop时十分有用。
总体来说,Cutpoint就是FoldTree中的不连续点,相对于peptide edge的连续性来说,cutpoint的出现相当于一把剪刀,将非peptide edge内氨基酸的连接关系切断,也就是FoldTree内EDGE的上下游关系在此处被切断。
# demo for add a cutpoint
cutpoint_pose = pose_from_sequence('AAAAAAAAAA')
pose_foldtree = cutpoint_pose.fold_tree()
print(pose_foldtree)
FOLD_TREE EDGE 1 10 -1
一个最简单的cutpoint的设置例子,本质上cutpoint就是原有共价连接的地方,不构建应有的链接
(图片来源: 晶泰科技团队)
from pyrosetta.rosetta.core.kinematics import FoldTree
# 完整定义试试:
ft = FoldTree() #设置一个空的foldtree对象
# 正确答案:
ft.add_edge(start=1, stop=5, label=-1)
ft.add_edge(start=5, stop=10, label=1)
ft.add_edge(start=10, stop=6, label=-1)
ft.check_fold_tree() # 检查foldtree的有效性;
# 检查foldtree的cutpoint数量。
ft.num_cutpoint()
cutpoint_pose.fold_tree(ft)
我们可以使用对应pose的FoldTree的num_cutpoint属性来看cutpoint的数量,并通过cutpoint属性来看对应cutpoint的位置,以及is_cutpoint判断是否为cutpoint
#看对应pose中cutpoint的属性
print(f"The FoldTree of the Pose: {cutpoint_pose.fold_tree()}")
print(f"The number of the cutpoint in Pose is : {cutpoint_pose.fold_tree().num_cutpoint()}") #说明Pose的cutpoint只有一个
print(f"The cutpoint in Pose is : {cutpoint_pose.fold_tree().cutpoint(1)}") #说明Pose中的断点在5号残基处
The FoldTree of the Pose: FOLD_TREE EDGE 1 5 -1 EDGE 5 10 1 EDGE 10 6 -1 The number of the cutpoint in Pose is : 1 The cutpoint in Pose is : 5
尝试改变7号残基的构象,看看效果: 从第5号残基处构象被切断。
# set phi3
cutpoint_pose.dump_pdb('./data/cutpoint_raw.pdb')
cutpoint_pose.set_phi(7, 69)
cutpoint_pose.set_psi(7, 60)
cutpoint_pose.dump_pdb('./data/cutpoint_pertuber.pdb')
True
一个设置cutpoint更快的方法: 直接使用FoldTree的new_jump函数,就可以完成上面伸所示的foldtree构建
# demo for add a cutpoint
cutpoint_pose = pose_from_sequence('AAAAAAAAAA')
pose_foldtree = cutpoint_pose.fold_tree()
pose_foldtree.new_jump(jump_pos1=5, jump_pos2=10, cutpoint=5)
#看对应pose中cutpoint的属性
print(f"The FoldTree of the Pose: {cutpoint_pose.fold_tree()}")
print(f"The number of the cutpoint in Pose is : {cutpoint_pose.fold_tree().num_cutpoint()}") #说明Pose的cutpoint只有一个
print(f"The cutpoint in Pose is : {cutpoint_pose.fold_tree().cutpoint(1)}") #说明Pose中的断点在5号残基处
The FoldTree of the Pose: FOLD_TREE EDGE 1 5 -1 EDGE 5 10 1 EDGE 10 6 -1 The number of the cutpoint in Pose is : 1 The cutpoint in Pose is : 5
在蛋白质建模的不同任务中,不同的任务通常需要自定义特定的FoldTree:
主要调用的函数为: setup_foldtree
from pyrosetta.rosetta.protocols.docking import setup_foldtree
from pyrosetta.rosetta.utility import vector1_int
# Docking FoldTree
#首先读取data文件中的PDB:1V74文件
pose = pose_from_pdb('./data/1v74.pdb')
print(pose.pdb_info())
core.import_pose.import_pose: {0} File './data/1v74.pdb' automatically determined to be of type PDB core.conformation.Conformation: {0} [ WARNING ] missing heavyatom: OXT on residue LEU:CtermProteinFull 107 core.conformation.Conformation: {0} [ WARNING ] missing heavyatom: OXT on residue LEU:CtermProteinFull 194 PDB file name: ./data/1v74.pdb Pose Range Chain PDB Range | #Residues #Atoms 0001 -- 0107 A 0591 -- 0697 | 0107 residues; 01728 atoms 0108 -- 0194 B 0001 -- 0087 | 0087 residues; 01425 atoms TOTAL | 0194 residues; 03153 atoms
读入的Pose中含有两条链,一条是A一条是B。使用setup_foldtree时我们需要先定义partner概念。
partner参数的基本形式是"X_Y", 含义是我需要构建一个X链为第一个刚体,Y为第二个刚体的。
partner参数支持3条链以上: 如“A_BC”, 含义是A作为一个可旋转平移刚体,BC链一同作为另外一个可旋转平移刚体。
setup_foldtree因此根据partner参数去构建相应的Docking FoldTree,并且将"_"中间设定为Jump1,因此一般Docking任务都会将movable_jumps参数中的vector1_int设置为1.
# 默认设置可以在刚体中移动的jumps为1
movable_jumps_v = vector1_int()
movable_jumps_v.append(1)
# 设置foldtree
setup_foldtree(pose, partner_chainID='A_B', movable_jumps=movable_jumps_v)
print(pose.fold_tree())
FOLD_TREE EDGE 1 88 -1 EDGE 88 107 -1 EDGE 88 158 1 EDGE 158 108 -1 EDGE 158 194 -1
思考:画出上面的图, 为什么setup_foldtree要这样去设置FoldTree呢?
主要调用的函数为: setup_foldtree
# 创建序列Pose
pose = pose_from_sequence('AAAAAAAAAAAAAAAAAAAAAAAAAAA')
print(pose.pdb_info())
PDB file name: AAAAAAAA Pose Range Chain PDB Range | #Residues #Atoms 0001 -- 0027 A 0001 -- 0027 | 0027 residues; 00273 atoms TOTAL | 0027 residues; 00273 atoms
假设我现在要对10-20这段区域的骨架进行随机化采样,又不希望这段区域的自由度变化对1-9, 21-27的区域造成影响。
这就是Loop modeling FoldTree需要做的事情。
from pyrosetta.rosetta.protocols.loops import Loops, Loop, fold_tree_from_loops
# define loops;
loop_fold_tree = FoldTree()
loops = Loops()
loop = Loop(10, 20, 15) # start_res, end_res, cut_res
loops.add_loop(loop)
# setup loop modelling foldtree;
fold_tree_from_loops(pose, loops, loop_fold_tree)
print(loop_fold_tree)
pose.fold_tree(loop_fold_tree)
FOLD_TREE EDGE 1 9 -1 EDGE 9 15 -1 EDGE 9 21 1 EDGE 21 16 -1 EDGE 21 27 -1
思考:画出上面的图,思考为什么fold_tree_from_loops要这样去设置FoldTree呢?
练习
如果我已知有一段功能结构区域范围,如何设置foldtree,在采样时避免这段区域发生变化呢? (难度系数: **)
如何设定一个可以在Docking中允许loops采样的FoldTree呢?(难度系数: ****)