Notebook

Split and combine multi-model PDB files, a.k.a. NMR-style multiple model pdb files, using command line¶

Expanded and live version of this nice page here at the CCP4 user community wiki. (Only 'live' if you launched from the launch binder badge.)

Note that certain molecular visualization tools have aspects that make models produced with some of these methods more suited than others. For example, as discussed here, Jmol model 0 has special meaning as select all models in Jmol. Only with my full-featured Python script, see below, do I address this limitation.

Table of Contents
Preparation
Bash split
Awk split
Bash/Awk split
Perl split
Simple Python split
Python script split
Python script merge
Collect files

Preparation¶

Get files to use in demonstrations.

In [1]:

!curl -OL https://files.rcsb.org/download/1JOX.pdb.gz
!gunzip 1JOX.pdb.gz
!curl -OL https://files.rcsb.org/download/1G03.pdb.gz
!gunzip 1G03.pdb.gz
!curl -OL https://files.rcsb.org/download/1K8H.pdb.gz
!gunzip 1K8H.pdb.gz
!curl -OL https://files.rcsb.org/download/1g9e.pdb.gz
!gunzip 1g9e.pdb.gz
!curl -OL https://files.rcsb.org/download/1D3Z.pdb.gz
!gunzip 1D3Z.pdb.gz
!curl -OL https://files.rcsb.org/download/5ZUX.pdb.gz
!gunzip 5ZUX.pdb.gz
!mkdir pdbs
!mv 5ZUX.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6BA3.pdb.gz
!gunzip 6BA3.pdb.gz
!mv 6BA3.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6GDK.pdb.gz
!gunzip 6GDK.pdb.gz
!mv 6GDK.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6H1K.pdb.gz
!gunzip 6H1K.pdb.gz
!mv 6H1K.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6EQY.pdb.gz
!gunzip 6EQY.pdb.gz
# Prepare a directory containing individual model files to use in merge example
!mkdir models
!curl -OL https://files.rcsb.org/download/1crn.pdb.gz
!gunzip 1crn.pdb.gz
!mv 1crn.pdb models/.
!curl -OL https://files.rcsb.org/download/1tup.pdb.gz
!gunzip 1tup.pdb.gz
!mv 1tup.pdb models/.
!curl -OL https://files.rcsb.org/download/1ehz.pdb.gz
!gunzip 1ehz.pdb.gz
!mv 1ehz.pdb models/.

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  269k  100  269k    0     0   606k      0 --:--:-- --:--:-- --:--:--  606k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  694k  100  694k    0     0  1585k      0 --:--:-- --:--:-- --:--:-- 1581k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  382k  100  382k    0     0   974k      0 --:--:-- --:--:-- --:--:--  974k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  575k  100  575k    0     0  1486k      0 --:--:-- --:--:-- --:--:-- 1482k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  230k  100  230k    0     0   663k      0 --:--:-- --:--:-- --:--:--  663k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  900k  100  900k    0     0  2098k      0 --:--:-- --:--:-- --:--:-- 2093k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  429k  100  429k    0     0   738k      0 --:--:-- --:--:-- --:--:--  738k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  772k  100  772k    0     0  1795k      0 --:--:-- --:--:-- --:--:-- 1791k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  163k  100  163k    0     0   528k      0 --:--:-- --:--:-- --:--:--  528k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  445k  100  445k    0     0  1149k      0 --:--:-- --:--:-- --:--:-- 1146k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10699  100 10699    0     0  51936      0 --:--:-- --:--:-- --:--:-- 51936
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  124k  100  124k    0     0   397k      0 --:--:-- --:--:-- --:--:--  400k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 47719  100 47719    0     0   162k      0 --:--:-- --:--:-- --:--:--  162k

Bash method to split¶

Adapted to notebook from here:

The code is:

i=1
while read -a line; do
    echo "${line[@]}" >> model_${i}.pdb
    [[ ${line[0]} == ENDMDL ]] && ((i++))
 done

Example:

In [2]:

s='''i=1
while read -a line; do
    echo "${line[@]}" >> model_${i}.pdb
    [[ ${line[0]} == ENDMDL ]] && ((i++))
 done'''

%store s > split_into_models.sh

#check file made
print ("\n")
!head split_into_models.sh 

Writing 's' (str) to file 'split_into_models.sh'.


i=1
while read -a line; do
    echo "${line[@]}" >> model_${i}.pdb
    [[ ${line[0]} == ENDMDL ]] && ((i++))
 done

Run that: (You'd leave out the ! if you were actually running this in a shell terminal.)

In [3]:

!bash split_into_models.sh < 1G03.pdb

(I originally used !bash split_into_models.sh < 1G03.pdb; however, got read: Illegal option -a. Switched to bash per Biffen's comment here.)

That produces output files with names that start with model_ before the model number . I'd prefer to tag them with the PDB id, too. Like so:

In [4]:

!for file in model_*.pdb ; do mv "$file" "1G03${file}"; done

That could be done using purely Python with the following code:

tag_to_add ="1G03"
import os
import sys
import fnmatch
model_pattern = "model_*.pdb"
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, model_pattern):
        os.rename(file, tag_to_add + file)

Awk method to split¶

Adapted to notebook from here:

The code is:

BEGIN {file = 0; filename = "model_"  file ".pdb"}
/ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
{print $0 > filename}

Example:

In [5]:

s='''BEGIN {file = 0; filename = "model_"  file ".pdb"}
/ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
{print $0 > filename}'''

%store s > split_into_models.awk

#check file made
print ("\n")
!head  split_into_models.awk

Writing 's' (str) to file 'split_into_models.awk'.


BEGIN {file = 0; filename = "model_"  file ".pdb"}
/ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
{print $0 > filename}

Run that:
(You'd leave out the ! if you were actually running this in a shell terminal.)

In [6]:

!awk -f split_into_models.awk < 1JOX.pdb

That produces output files with names that start with model_ before the model number . I'd prefer to tag them with the PDB id, too. Like so:

In [7]:

!for file in model_*.pdb ; do mv "$file" "1JOX${file}"; done

That could be done using purely Python with the following code:

tag_to_add ="1JOX"
import os
import sys
import fnmatch
model_pattern = "model_*.pdb"
for file in os.listdir('.'):
    if fnmatch.fnmatch(file, model_pattern):
        os.rename(file, tag_to_add + file)

Bash/awk one-liner split¶

Adapted to notebook from here:

The code is:

grep -n 'MODEL\|ENDMDL' models.pdb | cut -d: -f 1 | \
awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp models.pdb > model_%03d.pdb\n", $1-1,NR/2;}' |  bash -sf

Example:

In [8]:

!grep -n 'MODEL\|ENDMDL' 1K8H.pdb | cut -d: -f 1 | awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp 1K8H.pdb > 1K8Hmodel_%03d.pdb\n", $1-1,NR/2;}' |  bash -sf

Unlike above, this time I edited the naming part of the command instead of changing from model_####.pdb after it had run and produced that style of output.

(Note the first file it produced in this example doesn't look like it is an actual model file.)

Perl method to split¶

Adapted to notebook from here:

In [9]:

%%perl
$base='1g9e';open(IN,"<$base.pdb");@indata = <IN>;$i=0;
foreach $line(@indata) {
if($line =~ /^MODEL/) {++$i;$file="${base}_$i.pdb";open(OUT,">$file");next}
if($line =~ /^ENDMDL/) {next}
if($line =~ /^ATOM/ || $line =~ /^HETATM/) {print OUT "$line"}
}

Basic Python method to split¶

This version here is fleshed out & documented a bit more from the code that I posted this page. Note that it starts the model number at one which works better with Jmol, see above.
The main part of the code is:

 PDB_text = """
 PASTE YOUR PDB FILE TEXT HERE
 """

 model_number = 1
 new_file_text = ""
 for line in filter(None, PDB_text.splitlines()):
     line = line.strip () #for better control of ends of lines
     if line == "ENDMDL":
         # save file with file number in name
         output_file = open("model_" + str(model_number) + ".pdb", "w")
         output_file.write(new_file_text.rstrip('\r\n')) #rstrip to remove trailing newline
         output_file.close()
         # reset everything for next model
         model_number += 1
         new_file_text = ""
     elif not line.startswith("MODEL"):
         new_file_text += line + '\n'

It requires you to hand-edit the code to actually paste in the ENTIRE PDB file. I suggest skipping to the 'Python script method to split' below as it is more convenient. I am only posting this to show the underlying the process.

If you did prefer to use it, the following command would retrieve it into an active Jupyter session.

!curl -O https://raw.githubusercontent.com/fomightez/structurework/master/python_scripts/super_basic_multiple_model_PDB_file_splitter.py

Following retrieval, you'd open the file and paste your PDB file of interest in place of the text PASTE YOUR PDB FILE TEXT HERE. And then run it. This command would be used on a generic command line.

python super_basic_multiple_model_PDB_file_splitter.py

You can either enter that proceeded by an exclamation point in a notebook, or use this command in a cell in a notebook:

%run super_basic_multiple_model_PDB_file_splitter.py

Python script method to split¶

This script is more full featured and easier to use that the super basic python code version. However, the basic code above is a more concise representation of what goes on in the script.

It can be pointed at a directory and process all the files ending in '.pdb' or '.PDB' in that folder.

Note that it starts the model number at one which works better with Jmol, see above.

Example:

In [10]:

# retrieve the script
!curl -O https://raw.githubusercontent.com/fomightez/structurework/master/python_scripts/multiple_model_PDB_file_splitter.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 10343  100 10343    0     0  85479      0 --:--:-- --:--:-- --:--:-- 85479

In [11]:

# Run the script
%run multiple_model_PDB_file_splitter.py 1D3Z.pdb

Reading in your file...
Concluded. 
File split into 10 models. 
Files with names '1D3Z_model_1.pdb', '1D3Z_model_2.pdb', etc., 
have been created in same directory as the input file.

You'd use the command below if you were working on an actual command line:

python multiple_model_PDB_file_splitter.py 1D3Z.pdb

Next is an example of having it act on a directory:

First we'll check what the directory looks like before running the script.

In [12]:

!ls pdbs

5ZUX.pdb  6BA3.pdb  6GDK.pdb  6H1K.pdb

To run the script targeting that directory, you'd issue the command:

In [13]:

%run multiple_model_PDB_file_splitter.py pdbs

Reading in your file...
Concluded. 
File split into 20 models. 
Files with names 'pdbs/5ZUX_model_1.pdb', 'pdbs/5ZUX_model_2.pdb', etc., 
have been created in same directory as the input file.

Reading in your file...
Concluded. 
File split into 20 models. 
Files with names 'pdbs/6GDK_model_1.pdb', 'pdbs/6GDK_model_2.pdb', etc., 
have been created in same directory as the input file.

Reading in your file...
Concluded. 
File split into 10 models. 
Files with names 'pdbs/6H1K_model_1.pdb', 'pdbs/6H1K_model_2.pdb', etc., 
have been created in same directory as the input file.

Reading in your file...
Concluded. 
File split into 20 models. 
Files with names 'pdbs/6BA3_model_1.pdb', 'pdbs/6BA3_model_2.pdb', etc., 
have been created in same directory as the input file.

Confirming it worked.

In [14]:

!ls pdbs

5ZUX_model_10.pdb  5ZUX_model_9.pdb   6BA3_model_7.pdb	 6GDK_model_5.pdb
5ZUX_model_11.pdb  5ZUX.pdb	      6BA3_model_8.pdb	 6GDK_model_6.pdb
5ZUX_model_12.pdb  6BA3_model_10.pdb  6BA3_model_9.pdb	 6GDK_model_7.pdb
5ZUX_model_13.pdb  6BA3_model_11.pdb  6BA3.pdb		 6GDK_model_8.pdb
5ZUX_model_14.pdb  6BA3_model_12.pdb  6GDK_model_10.pdb  6GDK_model_9.pdb
5ZUX_model_15.pdb  6BA3_model_13.pdb  6GDK_model_11.pdb  6GDK.pdb
5ZUX_model_16.pdb  6BA3_model_14.pdb  6GDK_model_12.pdb  6H1K_model_10.pdb
5ZUX_model_17.pdb  6BA3_model_15.pdb  6GDK_model_13.pdb  6H1K_model_1.pdb
5ZUX_model_18.pdb  6BA3_model_16.pdb  6GDK_model_14.pdb  6H1K_model_2.pdb
5ZUX_model_19.pdb  6BA3_model_17.pdb  6GDK_model_15.pdb  6H1K_model_3.pdb
5ZUX_model_1.pdb   6BA3_model_18.pdb  6GDK_model_16.pdb  6H1K_model_4.pdb
5ZUX_model_20.pdb  6BA3_model_19.pdb  6GDK_model_17.pdb  6H1K_model_5.pdb
5ZUX_model_2.pdb   6BA3_model_1.pdb   6GDK_model_18.pdb  6H1K_model_6.pdb
5ZUX_model_3.pdb   6BA3_model_20.pdb  6GDK_model_19.pdb  6H1K_model_7.pdb
5ZUX_model_4.pdb   6BA3_model_2.pdb   6GDK_model_1.pdb	 6H1K_model_8.pdb
5ZUX_model_5.pdb   6BA3_model_3.pdb   6GDK_model_20.pdb  6H1K_model_9.pdb
5ZUX_model_6.pdb   6BA3_model_4.pdb   6GDK_model_2.pdb	 6H1K.pdb
5ZUX_model_7.pdb   6BA3_model_5.pdb   6GDK_model_3.pdb
5ZUX_model_8.pdb   6BA3_model_6.pdb   6GDK_model_4.pdb

The next final section of this notebook will use these files as an example of how to easily package them up for downloading to your local machines. See Collect files for easy downloading.

Python script method to merge¶

This script takes a directory as input and makes a single multi-model PDB file of any files ending in '.pdb' or '.PDB' in that folder.

Example:

In [15]:

# retrieve the script
!curl -O https://raw.githubusercontent.com/fomightez/structurework/master/python_scripts/merge_multi_PDBs_into_single_file.py

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 12390  100 12390    0     0  87253      0 --:--:-- --:--:-- --:--:-- 87253

In [16]:

%run merge_multi_PDBs_into_single_file.py models

/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6146.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6147.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6148.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain E is discontinuous at line 6149.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain F is discontinuous at line 6171.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6185.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6383.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6453.
  PDBConstructionWarning)

Processing <Structure id=models/1ehz>; it will be model #1
Processing <Structure id=models/1crn>; it will be model #2
Processing <Structure id=models/1tup>; it will be model #3

The PDB-formatted file models.pdb has been created.

The next cell will further show that running that last cell resulted in creating a file with multiple models from three individual structure files.

In [17]:

!head models.pdb

MODEL      1
ATOM      1  OP3   G A   1      50.193  51.190  50.534  1.00 99.85           O  
ATOM      2  P     G A   1      50.626  49.730  50.573  1.00100.19           P  
ATOM      3  OP1   G A   1      49.854  48.893  49.562  1.00100.19           O  
ATOM      4  OP2   G A   1      52.137  49.542  50.511  1.00 99.21           O  
ATOM      5  O5'   G A   1      50.161  49.136  52.023  1.00 99.82           O  
ATOM      6  C5'   G A   1      50.216  49.948  53.210  1.00 98.63           C  
ATOM      7  C4'   G A   1      50.968  49.231  54.309  1.00 97.84           C  
ATOM      8  O4'   G A   1      50.450  47.888  54.472  1.00 97.10           O  
ATOM      9  C3'   G A   1      52.454  49.030  54.074  1.00 98.07           C

This script has two optional features that can be used:

The starting model number is customizable.
By adjusting the file names, an order can be specified for the models in the resulting file.

These are each are demonstrated next.

Demonstrating customizing the starting number of models

You can specify a first model number using the --initial option, abbreviated -i, followed by the integer value to start with and then all subsequent models will be next in line following that.

In [18]:

%run merge_multi_PDBs_into_single_file.py models -i 23

/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6146.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6147.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6148.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain E is discontinuous at line 6149.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain F is discontinuous at line 6171.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6185.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6383.
  PDBConstructionWarning)
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6453.
  PDBConstructionWarning)

Processing <Structure id=models/1ehz>; it will be model #23
Processing <Structure id=models/1crn>; it will be model #24
Processing <Structure id=models/1tup>; it will be model #25

The PDB-formatted file models.pdb has been created.

The next line will show that the option worked to number the starting model as 23.

In [19]:

!head models.pdb

MODEL      23
ATOM      1  OP3   G A   1      50.193  51.190  50.534  1.00 99.85           O  
ATOM      2  P     G A   1      50.626  49.730  50.573  1.00100.19           P  
ATOM      3  OP1   G A   1      49.854  48.893  49.562  1.00100.19           O  
ATOM      4  OP2   G A   1      52.137  49.542  50.511  1.00 99.21           O  
ATOM      5  O5'   G A   1      50.161  49.136  52.023  1.00 99.82           O  
ATOM      6  C5'   G A   1      50.216  49.948  53.210  1.00 98.63           C  
ATOM      7  C4'   G A   1      50.968  49.231  54.309  1.00 97.84           C  
ATOM      8  O4'   G A   1      50.450  47.888  54.472  1.00 97.10           O  
ATOM      9  C3'   G A   1      52.454  49.030  54.074  1.00 98.07           C

Zero cannot be used in the script. You'd have to adjust them after if you needed that value in the file name.

Demonstrating ordering the models

If you add numbers to the file names you can specify the order of the files in the model. For example, above 1ehz.pdb was used as the first model by default. If that had been intended to be the third model, you could change the file names of the pdb files to be like this:

1crn_3.pdb
1tup_5.pdb
1ehz_7.pdb

The specific numbers don't matter. The lowest are first and the highest numbered will be last.

I'll leave that exercise to the user and instead demonstrate the script further with a larger group of files.
A merge based on that pattern of files names will be demonstrated by running the next two cells. First, we'll prepare a new version of the models directory using the split methods from above to make files with names matching that pattern and then run the merge script.

In [20]:

# Prepare a directory containing individual model files to use.
# First clear current `models` content.
!rm -rf models
# now prepare the directory with a new listing of files
!grep -n 'MODEL\|ENDMDL' 1G03.pdb | cut -d: -f 1 | awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp 1G03.pdb > model_%01d.pdb\n", $1-1,NR/2;}' |  bash -sf
!rm  model_1.pdb # like in earlier demo above, first resulting supposed 'model' is just part of header and so delete
!mkdir models
!mv model_*.pdb models/
!ls models/

model_10.pdb  model_14.pdb  model_18.pdb  model_2.pdb  model_6.pdb
model_11.pdb  model_15.pdb  model_19.pdb  model_3.pdb  model_7.pdb
model_12.pdb  model_16.pdb  model_20.pdb  model_4.pdb  model_8.pdb
model_13.pdb  model_17.pdb  model_21.pdb  model_5.pdb  model_9.pdb

(Note that because the first generated 'model', model_1.pdb, was just made of part of the header it was deleted as part of the preparation.)

In [21]:

%run merge_multi_PDBs_into_single_file.py models

Processing <Structure id=models/model_2>; it will be model #1
Processing <Structure id=models/model_3>; it will be model #2
Processing <Structure id=models/model_4>; it will be model #3
Processing <Structure id=models/model_5>; it will be model #4
Processing <Structure id=models/model_6>; it will be model #5
Processing <Structure id=models/model_7>; it will be model #6
Processing <Structure id=models/model_8>; it will be model #7
Processing <Structure id=models/model_9>; it will be model #8
Processing <Structure id=models/model_10>; it will be model #9
Processing <Structure id=models/model_11>; it will be model #10
Processing <Structure id=models/model_12>; it will be model #11
Processing <Structure id=models/model_13>; it will be model #12
Processing <Structure id=models/model_14>; it will be model #13
Processing <Structure id=models/model_15>; it will be model #14
Processing <Structure id=models/model_16>; it will be model #15
Processing <Structure id=models/model_17>; it will be model #16
Processing <Structure id=models/model_18>; it will be model #17
Processing <Structure id=models/model_19>; it will be model #18
Processing <Structure id=models/model_20>; it will be model #19
Processing <Structure id=models/model_21>; it will be model #20

The PDB-formatted file models.pdb has been created.

In [22]:

!head models.pdb

MODEL      1
ATOM      1  N   PRO A   1       4.524   9.887  -0.667  1.00  0.00           N  
ATOM      2  CA  PRO A   1       5.918  10.123  -0.175  1.00  0.00           C  
ATOM      3  C   PRO A   1       5.865  10.943   1.122  1.00  0.00           C  
ATOM      4  O   PRO A   1       5.284  12.009   1.177  1.00  0.00           O  
ATOM      5  CB  PRO A   1       6.697  10.871  -1.278  1.00  0.00           C  
ATOM      6  CG  PRO A   1       5.715  11.124  -2.430  1.00  0.00           C  
ATOM      7  CD  PRO A   1       4.374  10.484  -2.030  1.00  0.00           C  
ATOM      8  H   PRO A   1       4.341   8.864  -0.711  1.00  0.00           H  
ATOM      9  H3  PRO A   1       3.846  10.334  -0.018  1.00  0.00           H

The lowest numbered model was used first and then the numbers increased to match from there.

Note that all need to match the pattern of having an integer after an underscore and before the .pdb or .PDB. If a single one doesn't match, the order will just be based on the order the files happened to get processed in. For example, if we change one of the file names to not have a number, we can see the processing won't be as ordered in the series like it was in the last run of the script above.

In [23]:

#break it by changing lowest numbered one to have a letter instead of a number.
!mv models/model_2.pdb models/model_a.pdb 
%run merge_multi_PDBs_into_single_file.py models

Processing <Structure id=models/model_11>; it will be model #1
Processing <Structure id=models/model_15>; it will be model #2
Processing <Structure id=models/model_5>; it will be model #3
Processing <Structure id=models/model_12>; it will be model #4
Processing <Structure id=models/model_7>; it will be model #5
Processing <Structure id=models/model_a>; it will be model #6
Processing <Structure id=models/model_13>; it will be model #7
Processing <Structure id=models/model_4>; it will be model #8
Processing <Structure id=models/model_10>; it will be model #9
Processing <Structure id=models/model_17>; it will be model #10
Processing <Structure id=models/model_3>; it will be model #11
Processing <Structure id=models/model_19>; it will be model #12
Processing <Structure id=models/model_9>; it will be model #13
Processing <Structure id=models/model_20>; it will be model #14
Processing <Structure id=models/model_21>; it will be model #15
Processing <Structure id=models/model_16>; it will be model #16
Processing <Structure id=models/model_6>; it will be model #17
Processing <Structure id=models/model_8>; it will be model #18
Processing <Structure id=models/model_18>; it will be model #19
Processing <Structure id=models/model_14>; it will be model #20

The PDB-formatted file models.pdb has been created.

See the order of the files merged into the model file was much more haphazard.

Let's fix it and further demonstrate it isn't the specific number that matters, but the order. !mv models/model_a.pdb models/model_0.pdb

In [24]:

# Here will set the one file number lower than what it was originally.
!mv models/model_a.pdb models/model_0.pdb
%run merge_multi_PDBs_into_single_file.py models

Processing <Structure id=models/model_0>; it will be model #1
Processing <Structure id=models/model_3>; it will be model #2
Processing <Structure id=models/model_4>; it will be model #3
Processing <Structure id=models/model_5>; it will be model #4
Processing <Structure id=models/model_6>; it will be model #5
Processing <Structure id=models/model_7>; it will be model #6
Processing <Structure id=models/model_8>; it will be model #7
Processing <Structure id=models/model_9>; it will be model #8
Processing <Structure id=models/model_10>; it will be model #9
Processing <Structure id=models/model_11>; it will be model #10
Processing <Structure id=models/model_12>; it will be model #11
Processing <Structure id=models/model_13>; it will be model #12
Processing <Structure id=models/model_14>; it will be model #13
Processing <Structure id=models/model_15>; it will be model #14
Processing <Structure id=models/model_16>; it will be model #15
Processing <Structure id=models/model_17>; it will be model #16
Processing <Structure id=models/model_18>; it will be model #17
Processing <Structure id=models/model_19>; it will be model #18
Processing <Structure id=models/model_20>; it will be model #19
Processing <Structure id=models/model_21>; it will be model #20

The PDB-formatted file models.pdb has been created.

In fact, negative values can even work. (Although best avoided in general to make parsing file names more robust.)

In [25]:

!mv models/model_0.pdb models/model_-1.pdb
%run merge_multi_PDBs_into_single_file.py models

Processing <Structure id=models/model_-1>; it will be model #1
Processing <Structure id=models/model_3>; it will be model #2
Processing <Structure id=models/model_4>; it will be model #3
Processing <Structure id=models/model_5>; it will be model #4
Processing <Structure id=models/model_6>; it will be model #5
Processing <Structure id=models/model_7>; it will be model #6
Processing <Structure id=models/model_8>; it will be model #7
Processing <Structure id=models/model_9>; it will be model #8
Processing <Structure id=models/model_10>; it will be model #9
Processing <Structure id=models/model_11>; it will be model #10
Processing <Structure id=models/model_12>; it will be model #11
Processing <Structure id=models/model_13>; it will be model #12
Processing <Structure id=models/model_14>; it will be model #13
Processing <Structure id=models/model_15>; it will be model #14
Processing <Structure id=models/model_16>; it will be model #15
Processing <Structure id=models/model_17>; it will be model #16
Processing <Structure id=models/model_18>; it will be model #17
Processing <Structure id=models/model_19>; it will be model #18
Processing <Structure id=models/model_20>; it will be model #19
Processing <Structure id=models/model_21>; it will be model #20

The PDB-formatted file models.pdb has been created.

Collect files for easy downloading¶

To make this example easier, we'll first change directories into the directory where we last split multi-model files into individual models.

In [26]:

%cd pdbs
!ls

/home/jovyan/pdbs
5ZUX_model_10.pdb  5ZUX_model_9.pdb   6BA3_model_7.pdb	 6GDK_model_5.pdb
5ZUX_model_11.pdb  5ZUX.pdb	      6BA3_model_8.pdb	 6GDK_model_6.pdb
5ZUX_model_12.pdb  6BA3_model_10.pdb  6BA3_model_9.pdb	 6GDK_model_7.pdb
5ZUX_model_13.pdb  6BA3_model_11.pdb  6BA3.pdb		 6GDK_model_8.pdb
5ZUX_model_14.pdb  6BA3_model_12.pdb  6GDK_model_10.pdb  6GDK_model_9.pdb
5ZUX_model_15.pdb  6BA3_model_13.pdb  6GDK_model_11.pdb  6GDK.pdb
5ZUX_model_16.pdb  6BA3_model_14.pdb  6GDK_model_12.pdb  6H1K_model_10.pdb
5ZUX_model_17.pdb  6BA3_model_15.pdb  6GDK_model_13.pdb  6H1K_model_1.pdb
5ZUX_model_18.pdb  6BA3_model_16.pdb  6GDK_model_14.pdb  6H1K_model_2.pdb
5ZUX_model_19.pdb  6BA3_model_17.pdb  6GDK_model_15.pdb  6H1K_model_3.pdb
5ZUX_model_1.pdb   6BA3_model_18.pdb  6GDK_model_16.pdb  6H1K_model_4.pdb
5ZUX_model_20.pdb  6BA3_model_19.pdb  6GDK_model_17.pdb  6H1K_model_5.pdb
5ZUX_model_2.pdb   6BA3_model_1.pdb   6GDK_model_18.pdb  6H1K_model_6.pdb
5ZUX_model_3.pdb   6BA3_model_20.pdb  6GDK_model_19.pdb  6H1K_model_7.pdb
5ZUX_model_4.pdb   6BA3_model_2.pdb   6GDK_model_1.pdb	 6H1K_model_8.pdb
5ZUX_model_5.pdb   6BA3_model_3.pdb   6GDK_model_20.pdb  6H1K_model_9.pdb
5ZUX_model_6.pdb   6BA3_model_4.pdb   6GDK_model_2.pdb	 6H1K.pdb
5ZUX_model_7.pdb   6BA3_model_5.pdb   6GDK_model_3.pdb
5ZUX_model_8.pdb   6BA3_model_6.pdb   6GDK_model_4.pdb

Now we can package up the individual models into one easy to download archive with commands like these:

In [27]:

!tar czf 5ZUX_chains.tar.gz 5ZUX_model_*.pdb

In [28]:

!tar czf 6BA3_chains.tar.gz 6BA3_model_*.pdb

In [29]:

!tar czf 6GDK_chains.tar.gz 6GDK_model_*.pdb

In [30]:

!tar czf 6H1K_chains.tar.gz 6H1K_model_*.pdb

Verify it worked by viewing the list of the files in the directory.

In [31]:

!ls

5ZUX_chains.tar.gz  5ZUX_model_9.pdb	6BA3_model_7.pdb    6GDK_model_5.pdb
5ZUX_model_10.pdb   5ZUX.pdb		6BA3_model_8.pdb    6GDK_model_6.pdb
5ZUX_model_11.pdb   6BA3_chains.tar.gz	6BA3_model_9.pdb    6GDK_model_7.pdb
5ZUX_model_12.pdb   6BA3_model_10.pdb	6BA3.pdb	    6GDK_model_8.pdb
5ZUX_model_13.pdb   6BA3_model_11.pdb	6GDK_chains.tar.gz  6GDK_model_9.pdb
5ZUX_model_14.pdb   6BA3_model_12.pdb	6GDK_model_10.pdb   6GDK.pdb
5ZUX_model_15.pdb   6BA3_model_13.pdb	6GDK_model_11.pdb   6H1K_chains.tar.gz
5ZUX_model_16.pdb   6BA3_model_14.pdb	6GDK_model_12.pdb   6H1K_model_10.pdb
5ZUX_model_17.pdb   6BA3_model_15.pdb	6GDK_model_13.pdb   6H1K_model_1.pdb
5ZUX_model_18.pdb   6BA3_model_16.pdb	6GDK_model_14.pdb   6H1K_model_2.pdb
5ZUX_model_19.pdb   6BA3_model_17.pdb	6GDK_model_15.pdb   6H1K_model_3.pdb
5ZUX_model_1.pdb    6BA3_model_18.pdb	6GDK_model_16.pdb   6H1K_model_4.pdb
5ZUX_model_20.pdb   6BA3_model_19.pdb	6GDK_model_17.pdb   6H1K_model_5.pdb
5ZUX_model_2.pdb    6BA3_model_1.pdb	6GDK_model_18.pdb   6H1K_model_6.pdb
5ZUX_model_3.pdb    6BA3_model_20.pdb	6GDK_model_19.pdb   6H1K_model_7.pdb
5ZUX_model_4.pdb    6BA3_model_2.pdb	6GDK_model_1.pdb    6H1K_model_8.pdb
5ZUX_model_5.pdb    6BA3_model_3.pdb	6GDK_model_20.pdb   6H1K_model_9.pdb
5ZUX_model_6.pdb    6BA3_model_4.pdb	6GDK_model_2.pdb    6H1K.pdb
5ZUX_model_7.pdb    6BA3_model_5.pdb	6GDK_model_3.pdb
5ZUX_model_8.pdb    6BA3_model_6.pdb	6GDK_model_4.pdb

List of the files produced from the commands just above:

5ZUX_chains.tar.gz
6BA3_chains.tar.gz
6GDK_chains.tar.gz
6H1K_chains.tar.gz

Download the gzipped tarballed archives produced to your local machine.