@Author: 吴炜坤
@email:weikun.wu@xtalpi.com/weikunwu@163.com
Resfile是控制Packer的外部输入文件,用于告诉Rosetta Packer如何对结构中的每一个氨基酸侧链自由度进行定义。Resfile以文件的形式被ReadResfile函数读取并生成对应的TaskOperation。用户可以很方便地在外部进行快速的定义,而不需要写出复杂的selector+RLT的方式。但Resfile系统的缺点是每个文件的定义都需要人去处理,无法自动化完成任务。
# 初始化PyRosetta
from pyrosetta import init, pose_from_pdb
from pyrosetta.rosetta.core.pack.task.operation import ReadResfile
from pyrosetta.rosetta.core.pack.task import TaskFactory
init()
pose = pose_from_pdb('./data/helix.pdb')
PyRosetta-4 2021 [Rosetta PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release 2021.26+release.b308454c455dd04f6824cc8b23e54bbb9be2cdd7 2021-07-02T13:01:54] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
core.init: {0} Checking for fconfig files in pwd and ./rosetta/flags
core.init: {0} Rosetta version: PyRosetta4.conda.mac.cxx11thread.serialization.python37.Release r288 2021.26+release.b308454c455 b308454c455dd04f6824cc8b23e54bbb9be2cdd7 http://www.pyrosetta.org 2021-07-02T13:01:54
core.init: {0} command: PyRosetta -ex1 -ex2aro -database /opt/miniconda3/lib/python3.7/site-packages/pyrosetta/database
basic.random.init_random_generator: {0} 'RNG device' seed mode, using '/dev/urandom', seed=-867162182 seed_offset=0 real_seed=-867162182 thread_index=0
basic.random.init_random_generator: {0} RandomGenerator:init: Normal mode, seed=-867162182 RG_type=mt19937
core.chemical.GlobalResidueTypeSet: {0} Finished initializing fa_standard residue type set. Created 984 residue types
core.chemical.GlobalResidueTypeSet: {0} Total time to initialize 0.668191 seconds.
core.import_pose.import_pose: {0} File './data/helix.pdb' automatically determined to be of type PDB
core.conformation.Conformation: {0} [ WARNING ] missing heavyatom: OXT on residue GLY:CtermProteinFull 14
在Resfile中,我们使用的编号策略是: PDB Numbering, 并且大小写敏感。通常一个Resfile的格式包括两部分:HEADER & BODYs.
每个部分的作用:
Resfile实例:
这部分语法实现的是除BODY部分中出现的氨基酸位点进行自由度控制,控制可以设置为Design/Repacking/No_repack三种基本状态。以下列举所有的语法:
PROPERTY一般常用可选:
编写一个NATRO相关的Resfile的demo:
NATRO
START
# restrict to baestype list
resfile_type = ReadResfile('./data/NATRO.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 FALSE FALSE 2 FALSE FALSE 3 FALSE FALSE 4 FALSE FALSE 5 FALSE FALSE 6 FALSE FALSE 7 FALSE FALSE 8 FALSE FALSE 9 FALSE FALSE 10 FALSE FALSE 11 FALSE FALSE 12 FALSE FALSE 13 FALSE FALSE 14 FALSE FALSE
尝试写更多的Resfile并读取到上述的代码中,比较不同语法之间的差异。
Rosetta Pack采样Rotamer时是离散的,默认只会采纳每个格点的中心富集的构象,我们可以通过Extra Rotamer相关控制手段来增加Rotamer的采样,默认扩充采样时,采集Rotamer时会额外考虑平均χ的+/-1个标准差的构象。这种Extra Rotamer相关控制仅对包埋的残基有效!
只要在Resfile中HEADER中使用Extra Rotamer Commands字段即可,目前针对不同的氨基酸有四种编写方式:
EX <chi-id> LEVEL <level-value> 语法
EX ARO <chi-id> LEVEL <level-value> 语法
EX_CUTOFF <number of neighbors> 语法
USE_INPUT_SC 语法
LEVEL参数目前有7个级别:
以下举一个控制Rotamer丰度的Resfile例子:
NATRO
EX 1 EX 2
START
# restrict to baestype list
resfile_type = ReadResfile('./data/EX1EX2.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
# 查看每个残基的Rotamer采样级别:
print(packer_task.task_string(pose))
start 1 A NATRO EX ARO 1 EX ARO 2 2 A NATRO EX ARO 1 EX ARO 2 3 A NATRO EX ARO 1 EX ARO 2 4 A NATRO EX ARO 1 EX ARO 2 5 A NATRO EX ARO 1 EX ARO 2 6 A NATRO EX ARO 1 EX ARO 2 7 A NATRO EX ARO 1 EX ARO 2 8 A NATRO EX ARO 1 EX ARO 2 9 A NATRO EX ARO 1 EX ARO 2 10 A NATRO EX ARO 1 EX ARO 2 11 A NATRO EX ARO 1 EX ARO 2 12 A NATRO EX ARO 1 EX ARO 2 13 A NATRO EX ARO 1 EX ARO 2 14 A NATRO EX ARO 1 EX ARO 2
可见所有位点的Rotamer都会额外采集χ1和χ2角。(上述结果存在显示错误,我们并没有设置ARO,其实是没有设置ARO的。因此不影响实际的运行效果)
除了全局控制,我们可以还可在HEADER中特定地给一些氨基酸设置额外Rotamer采集:
1. EX 1 EX 2
2. EX ARO 2
3. EX 1 LEVEL 7
4. EX 1 EX ARO 1 LEVEL 4
思考上述语法的具体含义。
BODY部分用于指明特定位点或范围的氨基酸Rotamer自由度。指定的形式一共有4种。
第一列为氨基酸的PDB编号(允许有insert code)。第二列为PDB链编号,第三列为COMMAND项。
基本语法:
<PDBNUM>[<ICODE>] <CHAIN> <COMMANDS>
注: ICODE是指PDB中存在特殊插入编号字符时使用,如抗体等有特殊编号的系统,和PDBNUM连续编写如35A,35B等。正常的PDBNUM应该只有数字。
使用举例:
NATRO
EX 1 EX 2
START
3 A ALLAA # 3号位允许设计为20种氨基酸
4 A APOLAR # 4号位只允许在非极性氨基酸范围
# restrict to baestype list
resfile_type = ReadResfile('./data/position_mut.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 FALSE FALSE 2 FALSE FALSE 3 TRUE TRUE ALA,CYS,ASP,GLU,PHE,GLY,HIS,HIS_D,ILE,LYS,LEU,MET,ASN,PRO,GLN,ARG,SER,THR,VAL,TRP,TYR 4 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 5 FALSE FALSE 6 FALSE FALSE 7 FALSE FALSE 8 FALSE FALSE 9 FALSE FALSE 10 FALSE FALSE 11 FALSE FALSE 12 FALSE FALSE 13 FALSE FALSE 14 FALSE FALSE
使用举例:
NATRO
EX 1 EX 2
START
1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围
# restrict to baestype list
resfile_type = ReadResfile('./data/range_mut.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 TRUE TRUE ALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull 2 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 3 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 4 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 5 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 6 FALSE FALSE 7 FALSE FALSE 8 FALSE FALSE 9 FALSE FALSE 10 FALSE FALSE 11 FALSE FALSE 12 FALSE FALSE 13 FALSE FALSE 14 FALSE FALSE
使用举例:
NATRO
EX 1 EX 2
START
* A PROPERTY HYDROPHOBIC # A链所有位点可设计为疏水的天然氨基酸
# restrict to baestype list
resfile_type = ReadResfile('./data/chain_range.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
# 查看每个残基的Rotamer采样级别:
print(packer_task.task_string(pose))
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 TRUE TRUE PHE:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull 2 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 3 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 4 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 5 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 6 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 7 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 8 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 9 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 10 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 11 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 12 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 13 TRUE TRUE PHE,ILE,LEU,MET,VAL,TRP,TYR 14 TRUE TRUE PHE:CtermProteinFull,ILE:CtermProteinFull,LEU:CtermProteinFull,MET:CtermProteinFull,VAL:CtermProteinFull,TRP:CtermProteinFull,TYR:CtermProteinFull start
当在BODY中想引入非标准氨基酸时,需要特殊的格式进行指定(2019年版本的Rosetta支持该语法)
<PDBNUM>[<ICODE>] <CHAIN> <COMMANDS> X[ncaa]
不同的地方在于COMMANDs部分: 非标准氨基酸加入前必须加入"X[ncaa]" ncaa=非标准氨基酸的三字母缩写
使用举例:
NATRO
EX 1 EX 2
START
5 A PIKAA X[B36]X[A20] # 5号引入单点非标准氨基酸B36以及A20非标准氨基酸
from pyrosetta.rosetta.core.pack.palette import CustomBaseTypePackerPalette
# restrict to baestype list
resfile_type = ReadResfile('./data/ncaa.resfile')
# 先在CustomBaseTypePackerPalette引入NCAA列表
pp = CustomBaseTypePackerPalette()
pp.add_type('B36')
pp.add_type('A20')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.set_packer_palette(pp) ## 加载Palette到TaskFactory中;
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 FALSE FALSE 2 FALSE FALSE 3 FALSE FALSE 4 FALSE FALSE 5 TRUE TRUE B36,A20 6 FALSE FALSE 7 FALSE FALSE 8 FALSE FALSE 9 FALSE FALSE 10 FALSE FALSE 11 FALSE FALSE 12 FALSE FALSE 13 FALSE FALSE 14 FALSE FALSE
在Resfile中,如果BODY部分指定发生了重叠,有两种处理方式:
使用举例:
NATRO
EX 1 EX 2
START
1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围内进行Rotamer搜索
3 - 5 A POLAR # A链3-5号位只允许在极性氨基酸范围内进行Rotamer搜索
# restrict to baestype list
resfile_type = ReadResfile('./data/multi-logic.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 TRUE TRUE ALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull 2 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 3 FALSE FALSE 4 FALSE FALSE 5 FALSE FALSE 6 FALSE FALSE 7 FALSE FALSE 8 FALSE FALSE 9 FALSE FALSE 10 FALSE FALSE 11 FALSE FALSE 12 FALSE FALSE 13 FALSE FALSE 14 FALSE FALSE
同一指定级别,3-5号氨基酸自由度为空集。
另外一种情况:
使用举例:
NATRO
EX 1 EX 2
START
1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围内进行Rotamer搜索
3 A POLAR # A链3号氨基酸设计为极性氨基酸范围
# restrict to baestype list
resfile_type = ReadResfile('./data/multi-logic2.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 TRUE TRUE ALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull 2 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 3 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 4 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 5 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 6 FALSE FALSE 7 FALSE FALSE 8 FALSE FALSE 9 FALSE FALSE 10 FALSE FALSE 11 FALSE FALSE 12 FALSE FALSE 13 FALSE FALSE 14 FALSE FALSE
得到的结果: 3号设计为极性氨酸,1-2,4-5号氨基酸设计为非极性氨基酸。因为单位点优先级高于氨基酸范围指定。
再举一个例子:
NATRO
EX 1 EX 2
START
* A POLAR # A链所有氨基酸设计为极性氨酸
1 - 5 A APOLAR # A链1-5号位只允许在非极性氨基酸范围内进行Rotamer搜索
# restrict to baestype list
resfile_type = ReadResfile('./data/multi-logic3.resfile')
# 将TaskOperations加载至TaskFactory中
pack_tf = TaskFactory()
pack_tf.push_back(resfile_type)
# 生成PackerTask
packer_task = pack_tf.create_task_and_apply_taskoperations(pose)
print(packer_task)
#Packer_Task Threads to request: ALL AVAILABLE resid pack? design? allowed_aas 1 TRUE TRUE ALA:NtermProteinFull,CYS:NtermProteinFull,PHE:NtermProteinFull,GLY:NtermProteinFull,ILE:NtermProteinFull,LEU:NtermProteinFull,MET:NtermProteinFull,PRO:NtermProteinFull,VAL:NtermProteinFull,TRP:NtermProteinFull,TYR:NtermProteinFull 2 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 3 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 4 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 5 TRUE TRUE ALA,CYS,PHE,GLY,ILE,LEU,MET,PRO,VAL,TRP,TYR 6 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 7 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 8 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 9 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 10 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 11 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 12 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 13 TRUE TRUE ASP,GLU,HIS,HIS_D,LYS,ASN,GLN,ARG,SER,THR 14 TRUE TRUE ASP:CtermProteinFull,GLU:CtermProteinFull,HIS:CtermProteinFull,HIS_D:CtermProteinFull,LYS:CtermProteinFull,ASN:CtermProteinFull,GLN:CtermProteinFull,ARG:CtermProteinFull,SER:CtermProteinFull,THR:CtermProteinFull
得到的结果: 1-5号设计为非极性氨酸,其余氨基酸设计为极性氨基酸。因为氨基酸范围指定优先级大于链范围指定。