@Author: 槐喆 @email:zhe.huai@xtalpi.com
@Proofread: 吴炜坤 @email:weikun.wu@xtalpi.com
氨基酸选择器(ResidueSelector)具有十分重要的功能。它能够从蛋白质结构(Pose)中选取并生成氨基酸子集。一旦生成了这些子集,对后续建模的逻辑操作具有重大的意义,比如可以定义设计或采样的自由度(使用ResidueSelector可以将蛋白质距离内核中心5埃范围内的氨基酸选择出来,后续进行氨基酸侧链能量最小化等结构优化),也可以配合SimpleMetrics、Filter等进行蛋白质性质或参数的统计。
注: ResidueSelectors的概念比较简单也比较利于初学者理解,因此此章节学习难度较小。
在PyRosetta中,定义好ResidueSelectors后,进行apply(可以理解为执行选择的过程),我们将得到氨基酸残基的子集列表。这个列表被保存在vector1
# 导入链选择器
from pyrosetta import pose_from_pdb, init
from pyrosetta.rosetta.core.select.residue_selector import ChainSelector
init()
# 从pdb中读入生成pose对象,(肝细胞生长因子抗体PDB:6LZ9)
pose = pose_from_pdb('./data/6LZ9_H_L.pdb')
PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.31+release.c7009b3115c22daa9efe2805d9d1ebba08426a54 2021-08-07T10:04:12] retrieved from: http://www.pyrosetta.org (C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team. core.init: {0} Checking for fconfig files in pwd and ./rosetta/flags core.init: {0} Rosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r292 2021.31+release.c7009b3115c c7009b3115c22daa9efe2805d9d1ebba08426a54 http://www.pyrosetta.org 2021-08-07T10:04:12 core.init: {0} command: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database basic.random.init_random_generator: {0} 'RNG device' seed mode, using '/dev/urandom', seed=1885576095 seed_offset=0 real_seed=1885576095 thread_index=0 basic.random.init_random_generator: {0} RandomGenerator:init: Normal mode, seed=1885576095 RG_type=mt19937 core.chemical.GlobalResidueTypeSet: {0} Finished initializing fa_standard residue type set. Created 983 residue types core.chemical.GlobalResidueTypeSet: {0} Total time to initialize 0.651952 seconds. core.import_pose.import_pose: {0} File './data/6LZ9_H_L.pdb' automatically determined to be of type PDB core.conformation.Conformation: {0} Found disulfide between residues 21 94 core.conformation.Conformation: {0} current variant for 21 CYS core.conformation.Conformation: {0} current variant for 94 CYS core.conformation.Conformation: {0} current variant for 21 CYD core.conformation.Conformation: {0} current variant for 94 CYD core.conformation.Conformation: {0} Found disulfide between residues 141 206 core.conformation.Conformation: {0} current variant for 141 CYS core.conformation.Conformation: {0} current variant for 206 CYS core.conformation.Conformation: {0} current variant for 141 CYD core.conformation.Conformation: {0} current variant for 206 CYD
# 先来看抗体的残基基本信息:
print(pose.pdb_info())
PDB file name: ./data/6LZ9_H_L.pdb Pose Range Chain PDB Range | #Residues #Atoms 0001 -- 0081 H 0002 -- 0082 | 0081 residues; 01283 atoms 0082 -- 0082 H 0082A -- 0082A | 0001 residues; 00011 atoms 0083 -- 0083 H 0082B -- 0082B | 0001 residues; 00011 atoms 0084 -- 0084 H 0082C -- 0082C | 0001 residues; 00019 atoms 0085 -- 0102 H 0083 -- 0100 | 0018 residues; 00271 atoms 0103 -- 0103 H 0100A -- 0100A | 0001 residues; 00010 atoms 0104 -- 0104 H 0100B -- 0100B | 0001 residues; 00021 atoms 0105 -- 0105 H 0100C -- 0100C | 0001 residues; 00021 atoms 0106 -- 0106 H 0100D -- 0100D | 0001 residues; 00010 atoms 0107 -- 0107 H 0100E -- 0100E | 0001 residues; 00017 atoms 0108 -- 0118 H 0101 -- 0111 | 0011 residues; 00160 atoms 0119 -- 0223 L 0001 -- 0105 | 0105 residues; 01600 atoms TOTAL | 0223 residues; 03434 atoms
print(f'抗体含有的链数量:{pose.num_chains()}')
print(f'抗体含有的氨基酸数量:{pose.total_residue()}')
抗体含有的链数量:2 抗体含有的氨基酸数量:223
可见抗体中,共有两条链。H链氨基酸范围是1-118,L链氨基酸范围是119-223。
# 选择抗体的重链,PDB链号为"H":
select_heavy_chain = ChainSelector('H')
selected = select_heavy_chain.apply(pose)
print(selected)
vector1_bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
结果解读
知识点1: vector1_bool中被选择的氨基酸返回“1”,而没有被选择的氨基酸返回“0”
知识点2: vector1_bool中是按照Pose编号进行编写的(从1开始),也就是说重链的编号从1 -> n, 轻链的编号从n+1 -> 223.
验证选择器是否正确:
index_list = [index+1 for index, i in enumerate(selected) if i == 1]
print(index_list)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118]
可见选择器正确选择了重链的所有氨基酸。
PyRosetta中内置SelectedResiduesPyMOLMetric的函数,可以直接显示被选择的氨基酸。
from pyrosetta.rosetta.core.simple_metrics.metrics import SelectedResiduesPyMOLMetric
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(select_heavy_chain)
prefix = 'heavy_chain_'
pymol_selected.apply(pose, prefix)
from pyrosetta.rosetta.core.simple_metrics import get_sm_data
sm_data = get_sm_data(pose)
string_metric = sm_data.get_string_metric_data()
string_metric['heavy_chain_pymol_selection']
'select rosetta_sele, (chain H and resid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,100E,101,102,103,104,105,106,107,108,109,110,111)'
第一步,在PyMol中的cmd对话框输入上述的选择命令;
第二步,用棍棒形式呈现 show sticks, rosetta_sele
氨基酸选择器按功能可分为三大类:
以下我们将逐步来讲解在实战中,都有哪些氨基酸选择可以为我们所用。
这一节主要简单示例,下一节将详细讲解不同的API。
第一部分是逻辑选择器,很好理解,按照逻辑分类为Not、And、Or逻辑关系,可以将两个选择器进行逻辑的再次选择。 在Rosetta中,负责逻辑定义的选择器为NotResidueSelector、AndResidueSelector、OrResidueSelector。 以下做实例说明:
# 还是以之前读入的抗体pose为例。
# 先定义选择的链Selector:
select_heavy_chain = ChainSelector('H')
select_light_chain = ChainSelector('L')
select_light_chain.apply(pose)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
# 可视化,选择轻链
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(select_light_chain)
prefix = 'light_chain_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['light_chain_pymol_selection']
'select rosetta_sele, (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'
#example1: 选择轻链**或**重链
from pyrosetta.rosetta.core.select.residue_selector import OrResidueSelector
light_or_heavy = OrResidueSelector(select_heavy_chain, select_light_chain)
residue_selector = light_or_heavy.apply(pose)
print(residue_selector)
vector1_bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
# 可视化 选择轻链**或**重链
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(light_or_heavy)
prefix = 'light_or_heavy_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['light_or_heavy_pymol_selection']
'select rosetta_sele, (chain H and resid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,100E,101,102,103,104,105,106,107,108,109,110,111) or (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'
#example2: 选择重链**且**轻链
from pyrosetta.rosetta.core.select.residue_selector import AndResidueSelector
light_and_heavy = AndResidueSelector(select_heavy_chain, select_light_chain)
residue_selector = light_and_heavy.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择重链**且**轻链
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(light_and_heavy)
prefix = 'light_and_heavy_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['light_and_heavy_pymol_selection']
'select rosetta_sele, '
重链和轻链之间没有交集,所以选择的结果是空集
#example3: 非选择器:
from pyrosetta.rosetta.core.select.residue_selector import NotResidueSelector
not_heavy = NotResidueSelector(select_heavy_chain)
residue_selector = not_heavy.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
# 可视化选择 非重链
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(not_heavy)
prefix = 'not_heavy_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['not_heavy_pymol_selection']
'select rosetta_sele, (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'
#example4: 选择整个Pose
from pyrosetta.rosetta.core.select.residue_selector import TrueResidueSelector
true = TrueResidueSelector()
residue_selector = true.apply(pose)
print(residue_selector)
vector1_bool[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
# 可视化选择 整个Pose
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(true)
prefix = 'entire_pose_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['entire_pose_pymol_selection']
'select rosetta_sele, (chain H and resid 2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,82A,82B,82C,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,100A,100B,100C,100D,100E,101,102,103,104,105,106,107,108,109,110,111) or (chain L and resid 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105)'
这类选择器的定义不依赖于具体的构象,仅仅依靠属性就可以定义。如氨基酸的序号,氨基酸的名称等。此次简单举两个例子进行说明。
3.2.1 ResidueIndexSelector
通过氨基酸的具体编号定义的选择器,不仅可以使用PDB编号、Pose编号,还可以指定氨基酸的范围进行选择。
from pyrosetta.rosetta.core.select.residue_selector import ResidueIndexSelector
# 根据具体的Pose编号选择:
pose_index_selector = ResidueIndexSelector('40,42,44')
residue_selector = pose_index_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择 特定的残基位点
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(pose_index_selector)
prefix = 'index_select_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['index_select_pymol_selection']
'select rosetta_sele, (chain H and resid 41,43,45)'
#example1: 根据具体的PDB编号选择, 注意需要附带上PDB链的信息。
pdb_index_selector = ResidueIndexSelector('62H,63H,64H')
residue_selector = pdb_index_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择 根据PDB编号选择的残基
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(pdb_index_selector)
prefix = 'pdb_index_select_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['pdb_index_select_pymol_selection']
'select rosetta_sele, (chain H and resid 62,63,64)'
#example2: 根据PDB的范围进行选择。
range_selector = ResidueIndexSelector('42H-60H')
residue_selector = range_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择 一定范围的残基
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(range_selector)
prefix = 'range_select_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['range_select_pymol_selection']
'select rosetta_sele, (chain H and resid 42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60)'
3.2.2. ResidueNameSelector
通过氨基酸的具体残基名定义的选择器:
#example1: 根据单个残基名进行选择:
from pyrosetta.rosetta.core.select.residue_selector import *
resname_selector = ResidueNameSelector('PHE')
residue_selector = resname_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择 根据残基名选择的残基
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(resname_selector)
prefix = 'resname_select_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['resname_select_pymol_selection']
'select rosetta_sele, (chain H and resid 27,79,100) or (chain L and resid 21,49,62,83,87,96,98)'
#example2: 根据多个残基名进行选择:
resname_selector = ResidueNameSelector('PHE,ASN')
residue_selector = resname_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择多个残基名对应的残基
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(resname_selector)
prefix = 'multi_resname_select_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['multi_resname_select_pymol_selection']
'select rosetta_sele, (chain H and resid 27,54,60,76,79,100) or (chain L and resid 21,31,34,49,62,77,83,87,96,98)'
#example3: 选择带修饰的氨基酸残基
resname_selector = ResidueNameSelector('CYS')
residue_selector = resname_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
好像出现了问题,残基选择器似乎没有正确地选择我所需要的二硫键残基。让我们打印21号残基的信息,看看出了什么问题?
print(pose.residue(21))
Residue 21: CYS:disulfide (CYS, C): Base: CYS Properties: POLYMER PROTEIN CANONICAL_AA SC_ORBITALS METALBINDING DISULFIDE_BONDED ALPHA_AA L_AA Variant types: DISULFIDE Main-chain atoms: N CA C Backbone atoms: N CA C O H HA Side-chain atoms: CB SG 1HB 2HB Atom Coordinates: N : 39.126, 55.553, 42.324 CA : 37.869, 55.182, 41.689 C : 37.774, 53.665, 41.73 O : 38.654, 52.976, 41.209 CB : 37.81, 55.713, 40.253 SG : 36.265, 55.41, 39.34 H : 39.995, 55.343, 41.854 HA : 37.051, 55.626, 42.256 1HB : 37.967, 56.792, 40.257 2HB : 38.614, 55.268, 39.667 Mirrored relative to coordinates in ResidueType: FALSE
结果解读
选择带二硫键的氨基酸时,使用CYS残基名并没有正确选择到对应的氨基酸,因为在Rosetta中,形成二硫键的半胱氨酸名为
CYS:disulfide, 接下来我们尝试换个名字进行选择.
resname_selector = ResidueNameSelector('CYS:disulfide')
residue_selector = resname_selector.apply(pose)
print(residue_selector)
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择 带修饰的残基
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(resname_selector)
prefix = 'ss_select_'
pymol_selected.apply(pose, prefix)
string_metric = sm_data.get_string_metric_data()
string_metric['ss_select_pymol_selection']
'select rosetta_sele, (chain H and resid 22,92) or (chain L and resid 23,88)'
结果解读
现在可以正确选择到对应的二硫键氨基酸子集了!这些二硫键的位置是22H, 92H, 23L, 88L。
顾名思义,这类选择器与分子结构的具体构象有关,具体地由二面角、二级结构、氢键、邻居分子数量、相互作用界面、对称性等几个层次去进行定义。这里以NeighborhoodResidueSelector为例进行简要说明。
3.3.1. NeighborhoodResidueSelector
选择邻近残基,默认选择10埃范围内的残基。有两种用法来选择,第一种选择半径范围内所有的氨基酸,第二种为选择邻近范围内的氨基酸
# 比如选择PDB编号为H链42号氨基酸的10埃范围内所有的氨基酸(包括42号氨基酸):
from pyrosetta.rosetta.core.select.residue_selector import NeighborhoodResidueSelector, ResidueIndexSelector
residue1_selector = ResidueIndexSelector('42H')
nbr_selector = NeighborhoodResidueSelector(residue1_selector, 10.0, True) # True 代表包括42号氨基酸。
nbr_selector.apply(pose)
core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ################ Cloning pose and building neighbor graph ################ core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] Ensure that pose is either scored or has update_residue_neighbors() called core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] before using NeighborhoodResidueSelector for maximum performance! core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ##########################################################################
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择PDB编号为H链42号氨基酸的10埃范围内所有的氨基酸且含42号氨基酸
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(nbr_selector)
prefix = 'nbr_select_'
pymol_selected.apply(pose, prefix)
core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ################ Cloning pose and building neighbor graph ################ core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] Ensure that pose is either scored or has update_residue_neighbors() called core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] before using NeighborhoodResidueSelector for maximum performance! core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ##########################################################################
string_metric = sm_data.get_string_metric_data()
string_metric['nbr_select_pymol_selection']
'select rosetta_sele, (chain H and resid 39,40,41,42,43,44,88,89)'
# 比如选择PDB编号为H链42号氨基酸10埃范围内所有的氨基酸(不包括42号氨基酸):
nbr_selector = NeighborhoodResidueSelector(residue1_selector, 10.0, False) # True 代表包括1号氨基酸。
nbr_selector.apply(pose)
core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ################ Cloning pose and building neighbor graph ################ core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] Ensure that pose is either scored or has update_residue_neighbors() called core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] before using NeighborhoodResidueSelector for maximum performance! core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ##########################################################################
vector1_bool[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
# 可视化选择 PDB编号为H链42号氨基酸的10埃范围内所有的氨基酸但不含42号氨基酸
pymol_selected = SelectedResiduesPyMOLMetric()
pymol_selected.set_residue_selector(nbr_selector)
prefix = 'nbr_noself_select_'
pymol_selected.apply(pose, prefix)
core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ################ Cloning pose and building neighbor graph ################ core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] Ensure that pose is either scored or has update_residue_neighbors() called core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] before using NeighborhoodResidueSelector for maximum performance! core.select.residue_selector.NeighborhoodResidueSelector: {0} [ WARNING ] ##########################################################################
string_metric = sm_data.get_string_metric_data()
string_metric['nbr_noself_select_pymol_selection']
'select rosetta_sele, (chain H and resid 39,40,41,43,44,88,89)'