PYROSETTA.DISTRIBUTED - RosettaScripts/Python Interface Integration

Integration Components

The python software ecosystem relies on a small set of shared core interfaces utilizing primitive language-native data structures, pure function invocation, and object serialization to provide loosely coupled interoperability between independent software components. Our component, the pyrosetta.distributed namespace, utilizes established elements of the Rosetta internal architecture: the Pose model & score representation, RosettaScript protocols, and Pose serialization.

The adoption of a small set of core interfaces supports integration with an array of scientific computing tools, including support for interactive development environments, common record-oriented data formats, statistical analysis and machine learning packages, and multiple distributed computing packages. The pyrosetta.distributed package provides example integrations with several preferred packages for data analysis (Pandas), distributed computing (Dask), and interactive development (Jupyter Notebook), but is loosely coupled to allow later integration with additional libraries.

In [1]:
import pyrosetta.distributed

# Distributed components perform default initialization on-demand, but 
# can be request custom initialization via
pyrosetta.distributed.maybe_init()
PyRosetta-4 2019 [Rosetta PyRosetta4.conda.linux.CentOS.python37.Release 2019.22+release.d8f9b4a90a8f2caa32948bacdb6e551591facd5f 2019-05-30T13:47:16] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.

Data Structures (pyrosetta.distributed.packed_pose)

“Primitive” datatypes form a primary interface between many python libraries and, though not strictly defined, typically include the built-in scalar types (string, int, bool, float, ...), key-value dicts, and lists. Libraries operating on more complex user-defined classes often expose routines interconverting to and from primitive datatypes, and primitive datatypes can be efficiently serialized in multiple formats. For interaction between Rosetta protocol components and external libraries, we developed the pyrosetta.distributed.packed_pose namespace. This implements an isomorphism between the Pose object and dict-like records of the molecular model and scores. The Pose class represents a mutable, full-featured molecular model with non-trivial memory footprint. A Pose may be inexpensively interconverted to a compact binary encoding via recently developed cereal-based serialization in the suite. This serialized format is used to implement the PackedPose class, an immutable record containing model scores and the encoded model, which is isomorphic to a dict-based record. Adaptor functions within the packed_pose namespace freely adapt between collections of Pose (packed_pose.to_pose), PackedPose (packed_pose.to_packed), dict-records (packed_pose.to_dict) and pandas.DataFrame objects. (Fig 2.A)

In [2]:
import pyrosetta.distributed.packed_pose as packed_pose
import pyrosetta.distributed.io as io
import requests
import pandas

ubq = io.pose_from_pdbstring(requests.get("https://files.rcsb.org/download/1UBQ.pdb").text)

# Packed pose structures interconvert between multiple datatypes.
display(ubq)
display(packed_pose.to_pose(ubq))
display(packed_pose.to_dict(ubq).keys())
<pyrosetta.distributed.packed_pose.core.PackedPose at 0x7feeace2e090>
<pyrosetta.rosetta.core.pose.Pose at 0x7feeace31e30>
dict_keys(['pickled_pose'])

A dict-record and DataFrame interface provides zero-friction integration with a wide variety of data analysis tools and storage formats. For example, the record-oriented format can be passed through statsmodels or scikit-learn based filtering and analysis and written to any json-encoded text file, avro record-oriented storage, or parquet column-oriented storage. The pyrosetta.distributed.io namespace implements functions that mirror the pyrosetta.io namespace, providing conversion between PackedPose and the PDB, MMCIF & Rosetta silent-file formats. Critically, the PackedPose record format can also be transparently serialized, stored with a minimal memory footprint, and transmitted between processes in a distributed computing context. This allows a distributed system to process PackedPose records as plain data, storing and transmitting a large number of model decoys while only unpacking a small working set into heavyweight Pose objects.

In [3]:
# Collections of packed pose structures interconvert to pandas DataFrame.

frame_poses = pandas.DataFrame.from_records([packed_pose.to_dict(ubq) for _ in range(5)])
display(frame_poses)
pickled_pose
0 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
1 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
2 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
3 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
4 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
In [4]:
packed_poses = packed_pose.to_packed(frame_poses)
display(packed_poses)
[<pyrosetta.distributed.packed_pose.core.PackedPose at 0x7feeafae1750>,
 <pyrosetta.distributed.packed_pose.core.PackedPose at 0x7feeac779850>,
 <pyrosetta.distributed.packed_pose.core.PackedPose at 0x7feeac779650>,
 <pyrosetta.distributed.packed_pose.core.PackedPose at 0x7feeac779890>,
 <pyrosetta.distributed.packed_pose.core.PackedPose at 0x7feeac7798d0>]

Protocol Components (pyrosetta.distributed.tasks)

RosettaScripts uses an XML-based DSL to tersely encode molecular modeling protocols with a pipeline-like dataflow. The rosetta_scripts interpreter functions by parsing, XSD-validating and initializing a single RosettaScripts protocol. It then applies this protocol to input structures repeatedly to produce simulation output. Recent work has expanded support for more complex dataflow, including multi-stage operations and additional logic; however, RosettaScripts is not intended to be a general purpose programming language.

The pyrosetta.distributed.tasks namespace encapsulates the RosettaScripts interface, allowing the DSL to be utilized within python processes. Protocol components are represented as ‘task’ objects containing an XML encoded script. Task objects are serializable via the standard pickle interface, and they use a simple caching strategy to perform on-demand initialization of the underlying protocol object as needed for task application.

In [5]:
import pyrosetta.distributed.tasks.score as score
import pyrosetta.distributed.tasks.rosetta_scripts as rosetta_scripts

# A blank RosettaScripts task
blank_task = rosetta_scripts.SingleoutputRosettaScriptsTask("""
    <ROSETTASCRIPTS>
    <SCOREFXNS> </SCOREFXNS>
    <TASKOPERATIONS></TASKOPERATIONS>
    <FILTERS>
    </FILTERS>
    <MOVERS>
    </MOVERS>
    <PROTOCOLS>
    </PROTOCOLS>
    </ROSETTASCRIPTS>
    """)
display(blank_task)

# A simple scoring task
score_task = score.ScorePoseTask()
display(score_task)

# The results of filters and scores are available as the PackedPose "scores"
scored_ubq = score_task(ubq)
display(scored_ubq.scores)
SingleoutputRosettaScriptsTask(protocol_xml = '\n    <ROSETTASCRIPTS>\n    <SCOREFXNS> </SCOREFXNS>\n    <TASKOPERATIONS></TASKOPERATIONS>\n    <FILTERS>\n    </FILTERS>\n    <MOVERS>\n    </MOVERS>\n    <PROTOCOLS>\n    </PROTOCOLS>\n    </ROSETTASCRIPTS>\n    ')
ScorePoseTask(patch = None, weights = None)
{'fa_atr': -397.6465926658618,
 'fa_rep': 103.70704606947386,
 'fa_sol': 242.95183729178729,
 'fa_intra_rep': 355.46866408199486,
 'fa_intra_sol_xover4': 16.826406860919942,
 'lk_ball_wtd': -8.755571649079277,
 'fa_elec': -113.09090558288852,
 'pro_close': 1.906104764589372,
 'hbond_sr_bb': -18.828056617518506,
 'hbond_lr_bb': -23.131565839644882,
 'hbond_bb_sc': -7.389119588161401,
 'hbond_sc': -1.5490919363291988,
 'dslf_fa13': 0.0,
 'omega': 4.283688243517373,
 'fa_dun': 412.2840241807293,
 'p_aa_pp': -21.346309331921773,
 'yhh_planarity': 0.0,
 'ref': 11.884429999999998,
 'rama_prepro': -16.216376041300332,
 'total_score': 32.67775729376015}

Task components accept any valid pose-equivalent data structure and return immutable PackedPose data structures by (1) deserializing the input into a short-lived Pose object, (2) applying the parsed protocol to the Pose and (3) serializing the resulting model as a PackedPose. Two task classes, SingleOutputRosettaScriptsTask and MultipleOutputRosettaScriptsTask define either a one-to-one function returning a single output, or a one-to-many protocol component returning a lazy iterator of outputs. All tasks operate as “pure functions”, returning a modified copy rather than directly manipulating input data structures. (Fig 2.B)

In [6]:
relax_task = rosetta_scripts.SingleoutputRosettaScriptsTask("""
    <ROSETTASCRIPTS>
    <SCOREFXNS> </SCOREFXNS>
    <TASKOPERATIONS></TASKOPERATIONS>
    <FILTERS></FILTERS>
    <MOVERS>
      <FastRelax name="fastrelax" repeats="1" />
    </MOVERS>
    <PROTOCOLS>
      <Add mover="fastrelax"/>
    </PROTOCOLS>
    </ROSETTASCRIPTS>
""")

# Protocol execution does not change the input pose.
# A modified copy is returned.
relaxed_ubq = relax_task(scored_ubq)

print(f"relaxed score: {relaxed_ubq.scores['total_score']}")
print(f"delta score: {relaxed_ubq.scores['total_score'] - scored_ubq.scores['total_score']}")
relaxed score: -234.77631712424798
delta score: -267.45407441800813

Interactive Analysis and Notebook-based Computing

Notebook-based interactive analysis, typified by the Jupyter project,18 has become a dominant tool in modern data science software development. In this model, data, code, output, and visualization are combined in a single document which is viewed and edited through a browser-based interface to a remote execution environment.

To facilitate interactive analysis, we extended the PyRosetta Pose interface to expose total, residue one-body, and residue-pair two-body terms of the Rosetta score function as NumPy structured arrays. Combined with the pandas.DataFrame representation offered in pyrosetta.distributed.packed_pose, this provides an expressive interface for interactive model analysis and selection.

In [7]:
# Pose energies are available under the energies *_energies_array accessor functions.

source_energies = scored_ubq.pose.energies()
relaxed_energies = relaxed_ubq.pose.energies()
display(relaxed_energies.residue_onebody_energies_array().dtype)

source_frame = pandas.DataFrame.from_records(source_energies.residue_total_energies_array())
relaxed_frame = pandas.DataFrame.from_records(relaxed_energies.residue_total_energies_array())

delta = relaxed_frame - source_frame
delta.index.name="residue index"
delta[["total_score"]].plot(title="Delta score via relax.")
dtype([('fa_atr', '<f8'),
       ('fa_rep', '<f8'),
       ('fa_sol', '<f8'),
       ('fa_intra_rep', '<f8'),
       ('fa_intra_sol_xover4', '<f8'),
       ('lk_ball_wtd', '<f8'),
       ('fa_elec', '<f8'),
       ('pro_close', '<f8'),
       ('hbond_sr_bb', '<f8'),
       ('hbond_lr_bb', '<f8'),
       ('hbond_bb_sc', '<f8'),
       ('hbond_sc', '<f8'),
       ('dslf_fa13', '<f8'),
       ('omega', '<f8'),
       ('fa_dun', '<f8'),
       ('p_aa_pp', '<f8'),
       ('yhh_planarity', '<f8'),
       ('ref', '<f8'),
       ('rama_prepro', '<f8'),
       ('total_score', '<f8')])
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7feeada6db90>

We also integrated existing documentation into the pyrosetta.distributed.docs namespace to allow introspection-based exploration of Mover and Filter

In [8]:
import pyrosetta.distributed.docs as docs
display(dir(docs.filters)[15:20])
display(docs.filters.ChainBreak)
['BuriedUnsatHbonds',
 'CalculatorFilter',
 'ChainBreak',
 'ChainCountFilter',
 'ChainExists']
INFORMATION ABOUT FILTER "ChainBreak":

DESCRIPTION:

Measures the number of chainBreaks in the pose

USAGE:

<ChainBreak threshold=(int,"1") chain_num=(int,"1") tolerance=(real,"0.13") name=(string) confidence=(real,"1.0")>
</ChainBreak>

OPTIONS:

"ChainBreak" tag:

	threshold (int,"1"):  Number of chainbreaks allowed

	chain_num (int,"1"):  which chain should we check for

	tolerance (real,"0.13"):  the allowed angstrom deviation from the mean optimal bond length

	name (string):  The name given to this instance.

	confidence (real,"1.0"):  Probability that the pose will be filtered out if it does not pass this Filter

RosettaScripts components. Existing tools for web-based biomolecular visualization, such as py3dmol and NGLview extend this interface to a fully-featured biomolecular simulation, analysis, and visualization environment. (Fig 5)

In [9]:
import py3Dmol
view = py3Dmol.view(linked=False, width=600, height=600)
view.addModel( io.to_pdbstring(relaxed_ubq), "pdb")
view.setStyle({'stick':{}})
view.addStyle({'cartoon':{}})
view.zoomTo()

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
jupyter labextension install jupyterlab_3dmol

Out[9]:
<py3Dmol.view at 0x7feeac0db450>

Multithreaded and Distributed Execution

Remote notebook execution has the distinct advantage of allowing a user to access computational resources far beyond the capabilities of a single workstation. By using tools such as Dask via the integrations described above, a remote notebook interface can be used to manage a distributed simulation spanning hundreds of cores for rapid model analysis, and it offers a viable alternative to traditional batch-based computing for some classes of simulation.

In [10]:
import dask
import dask.distributed

# Establish a single-node cluster of worker processes.
# See dask.distributed documentation for multi-node cluster tools.
cluster = dask.distributed.Client(dask.distributed.LocalCluster())
print(cluster)
<Client: scheduler='tcp://127.0.0.1:40888' processes=8 cores=48>

Rosetta-based simulations frequently involve execution of a large number of independent monte-carlo sampling trajectories that all begin from a single starting structure; in other words, they are “embarrassingly” or “trivially” parallel. The Rosetta suite implements a job distribution framework to manage I/O and task scheduling for parallelizable workloads of this type; this allows the rosetta_scripts interpreter to operate as a single process or within MPI, BOINC, and other distributed computing frameworks. Semantics of the RosettaScripts language have also evolved to incorporate non-trivial forms of parallelism, including support for multi-stage scatter/gather protocols. Though fully functional, this framework is optimized for operation as a standalone application and does not provide straightforward integration with third party tools or generalized program logic.

The combination of immutable data structures and pure function interfaces implemented in the pyrosetta.distributed namespace provides an alternative approach to job parallelization by integrating RosettaScripts as a submodule that is compatible with dask.distributed and other task-based distributed computing frameworks. By virtue of reliance on standard python primitives, the pyrosetta.distributed namespace is not tightly coupled to a single execution engine. Single-node scheduling may be managed via the standard multiprocessing or concurrent.futures interfaces, providing a zero-dependency solution for small-scale sampling or analysis tasks. Execution via MPI-based HPC deployments may be managed via the mpi4py interface.

To support effective distributed execution, the pyrosetta.distributed namespace is intended to be installed via a build configuration of PyRosetta, provided by conda packages described above, supporting multithreaded execution. This variant utilizes existing work establishing thread-safety in the suite, and it releases the CPython global interpreter lock when calling compiled Rosetta interfaces. This enables multi-core concurrent execution of independent modeling trajectories via python-managed threads, as well as python-level operations such as network I/O and process heartbeats to occur concurrently with long-running Rosetta API calls.

In [11]:
# A "delayed" task is distributed on the worker clusters
delayed_relax = dask.delayed(rosetta_scripts.SingleoutputRosettaScriptsTask("""
    <ROSETTASCRIPTS>
    <SCOREFXNS> </SCOREFXNS>
    <TASKOPERATIONS></TASKOPERATIONS>
    <FILTERS></FILTERS>
    <MOVERS>
      <FastRelax name="fastrelax" repeats="1" />
    </MOVERS>
    <PROTOCOLS>
      <Add mover="fastrelax"/>
    </PROTOCOLS>
    </ROSETTASCRIPTS>
"""))
relax_tasks = [delayed_relax(ubq) for _ in range(64)]
display(relax_tasks[:3])
[Delayed('SingleoutputRosettaScriptsTask(protocol_xml = \'\\n    <ROSETTASCRIPTS>\\n    <SCOREFXNS> </SCOREFXNS>\\n    <TASKOPERATIONS></TASKOPERATIONS>\\n    <FILTERS></FILTERS>\\n    <MOVERS>\\n      <FastRelax name="fastrelax" repeats="1" />\\n    </MOVERS>\\n    <PROTOCOLS>\\n      <Add mover="fastrelax"/>\\n    </PROTOCOLS>\\n    </ROSETTASCRIPTS>\\n\')-bc7d516c-6955-4e22-a22e-48a279f4c541'),
 Delayed('SingleoutputRosettaScriptsTask(protocol_xml = \'\\n    <ROSETTASCRIPTS>\\n    <SCOREFXNS> </SCOREFXNS>\\n    <TASKOPERATIONS></TASKOPERATIONS>\\n    <FILTERS></FILTERS>\\n    <MOVERS>\\n      <FastRelax name="fastrelax" repeats="1" />\\n    </MOVERS>\\n    <PROTOCOLS>\\n      <Add mover="fastrelax"/>\\n    </PROTOCOLS>\\n    </ROSETTASCRIPTS>\\n\')-5885becc-e258-42e7-bdba-8acf92ec2727'),
 Delayed('SingleoutputRosettaScriptsTask(protocol_xml = \'\\n    <ROSETTASCRIPTS>\\n    <SCOREFXNS> </SCOREFXNS>\\n    <TASKOPERATIONS></TASKOPERATIONS>\\n    <FILTERS></FILTERS>\\n    <MOVERS>\\n      <FastRelax name="fastrelax" repeats="1" />\\n    </MOVERS>\\n    <PROTOCOLS>\\n      <Add mover="fastrelax"/>\\n    </PROTOCOLS>\\n    </ROSETTASCRIPTS>\\n\')-2f0c494e-0885-4f39-9f30-236c4c42115a')]
In [12]:
# Persist, beginning computation on the distributed cluster.
relax_tasks, = dask.persist(relax_tasks)
In [13]:
# Multi-threaded worker processes begin a distributed relax.
!top -bn1 | head -n 20
top - 21:42:52 up  3:06,  0 users,  load average: 0.12, 0.08, 0.30
Tasks: 636 total,   7 running, 629 sleeping,   0 stopped,   0 zombie
%Cpu(s):  1.0 us,  0.0 sy,  0.0 ni, 98.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem : 19383412+total, 18696489+free,  3417512 used,  3451716 buff/cache
KiB Swap:        0 total,        0 free,        0 used. 18897763+avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
20246 lexaf     20   0 1614576 442568 198956 R  81.2  0.2   0:01.59 python
20250 lexaf     20   0 1615360 441368 198936 R  81.2  0.2   0:01.59 python
20252 lexaf     20   0 1613520 439736 198764 S  81.2  0.2   0:01.60 python
20254 lexaf     20   0 1610652 437384 198760 S  81.2  0.2   0:01.60 python
20260 lexaf     20   0 1614328 442196 198952 R  81.2  0.2   0:01.58 python
20248 lexaf     20   0 1616860 439768 198928 R  75.0  0.2   0:01.59 python
20256 lexaf     20   0 1613540 441372 198936 R  75.0  0.2   0:01.58 python
20258 lexaf     20   0 1614572 442544 198892 R  75.0  0.2   0:01.58 python
 9338 root      20   0       0      0      0 S  18.8  0.0   0:11.11 socknal_sd+
 9372 root      20   0       0      0      0 S   6.2  0.0   0:00.24 ptlrpcd_02+
11121 root      20   0       0      0      0 S   6.2  0.0   0:00.29 ldlm_bl_04
14254 root      20   0       0      0      0 S   6.2  0.0   0:00.15 ldlm_bl_08
20173 lexaf     20   0 3570384 912656 235400 S   6.2  0.5   0:22.89 ZMQbg/1
In [14]:
# Compute, pulling results from workers when completed.
relax_results, = dask.compute(relax_tasks)
In [15]:
relax_result_frame = pandas.DataFrame.from_records(packed_pose.to_dict(relax_results))
display(relax_result_frame)
display(relax_result_frame.describe())
fa_atr fa_rep fa_sol fa_intra_rep fa_intra_sol_xover4 lk_ball_wtd fa_elec pro_close hbond_sr_bb hbond_lr_bb ... hbond_sc dslf_fa13 omega fa_dun p_aa_pp yhh_planarity ref rama_prepro total_score pickled_pose
0 -414.533548 87.017880 240.935466 171.634621 13.561212 -8.316931 -138.802121 0.137294 -21.396509 -24.637263 ... -9.842593 0.0 14.882225 141.846081 -28.810877 0.001671 11.88443 -25.225484 -236.875258 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
1 -417.616036 89.665402 250.540932 172.235600 14.865754 -8.941395 -143.729947 0.115414 -22.292130 -25.138046 ... -11.709581 0.0 13.564104 147.703490 -26.779853 0.005146 11.88443 -24.831541 -233.748825 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
2 -416.037696 86.559167 241.219787 171.872161 13.114300 -9.171195 -133.330778 0.140058 -21.338510 -24.602856 ... -9.847375 0.0 14.890717 137.678186 -28.645664 0.001622 11.88443 -25.542342 -236.950268 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
3 -413.652532 86.523066 241.708757 173.253910 13.964998 -8.175880 -139.544256 0.134523 -21.332893 -24.553072 ... -9.870090 0.0 14.423985 139.206573 -28.524840 0.004676 11.88443 -25.556470 -237.413003 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
4 -405.057865 83.600478 250.550884 164.492798 14.157138 -10.255674 -143.107407 0.130500 -21.819339 -24.785927 ... -10.575684 0.0 10.606302 132.080317 -26.993373 0.009503 11.88443 -24.741133 -232.844306 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
59 -412.849655 82.278599 256.142183 170.457006 12.826256 -8.202753 -141.328995 0.075621 -21.420823 -24.372237 ... -9.589471 0.0 14.844322 123.350891 -27.365344 0.001342 11.88443 -28.595809 -241.388086 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
60 -417.564965 85.638752 251.586095 167.494238 14.068854 -7.520947 -146.342724 0.112407 -21.603107 -24.666775 ... -12.079606 0.0 15.722080 128.969237 -27.488374 0.003522 11.88443 -25.949225 -250.929621 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
61 -406.718309 81.164953 250.565289 167.861354 14.206585 -7.785318 -145.653755 0.138510 -21.896157 -25.440506 ... -10.826315 0.0 13.309296 127.432946 -27.171071 0.001769 11.88443 -25.857702 -244.112752 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
62 -405.338717 81.308925 242.362518 170.440913 13.111082 -8.250842 -138.932886 0.111431 -21.123840 -25.344347 ... -13.252032 0.0 14.103101 133.353189 -26.221917 0.000495 11.88443 -23.497958 -238.356824 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...
63 -421.355828 88.145221 253.151362 167.472356 14.056122 -8.470014 -146.658022 0.105385 -21.711494 -24.673435 ... -12.078374 0.0 15.877242 132.712699 -27.756211 0.006868 11.88443 -26.570178 -249.290683 gANjcHlyb3NldHRhLnJvc2V0dGEuY29yZS5wb3NlClBvc2...

64 rows × 21 columns

fa_atr fa_rep fa_sol fa_intra_rep fa_intra_sol_xover4 lk_ball_wtd fa_elec pro_close hbond_sr_bb hbond_lr_bb hbond_bb_sc hbond_sc dslf_fa13 omega fa_dun p_aa_pp yhh_planarity ref rama_prepro total_score
count 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.000000 64.0 64.000000 64.000000 64.000000 64.000000 6.400000e+01 64.000000 64.000000
mean -412.398387 84.666632 248.952723 170.862718 13.487279 -8.242926 -141.890999 0.120264 -21.631658 -24.815269 -12.999627 -10.933493 0.0 14.296867 131.351531 -27.525743 0.012438 1.188443e+01 -25.539245 -241.352149
std 5.092030 2.458225 4.847316 5.344470 0.701480 0.826497 3.770203 0.045677 0.416049 0.448841 1.457342 1.333651 0.0 2.064210 7.313366 0.896768 0.063118 1.790399e-15 1.268865 5.138987
min -421.896957 79.929049 240.367470 161.357663 12.286200 -10.388430 -150.225241 0.059318 -22.643724 -25.658065 -16.354001 -14.217683 0.0 8.937707 115.768251 -29.941382 0.000002 1.188443e+01 -29.497474 -253.163619
25% -416.177728 82.952851 245.247657 167.170257 12.843353 -8.705077 -144.755154 0.092470 -21.936929 -25.122118 -13.891951 -12.075459 0.0 12.979226 125.540210 -27.927236 0.001317 1.188443e+01 -26.209107 -244.884977
50% -412.698555 84.353729 249.709019 170.461304 13.616494 -8.236696 -142.395310 0.112472 -21.694169 -24.795306 -13.215612 -10.685479 0.0 14.532800 131.981932 -27.447900 0.002730 1.188443e+01 -25.549406 -241.091948
75% -407.814401 86.110133 251.954614 174.033605 14.086546 -7.673382 -139.732880 0.132240 -21.382692 -24.576877 -11.952388 -9.853509 0.0 15.829100 136.166153 -26.838487 0.006090 1.188443e+01 -24.869320 -237.144055
max -400.185799 90.215249 257.739908 185.313015 14.865754 -6.772219 -131.305415 0.338816 -20.472891 -22.606144 -9.671601 -7.805468 0.0 18.443958 147.703490 -26.136842 0.507180 1.188443e+01 -21.722543 -230.463111