Expanded and live version of this nice page here at the CCP4 user community wiki. (Only 'live' if you launched from the launch binder
badge.)
Note that certain molecular visualization tools have aspects that make models produced with some of these methods more suited than others. For example, as discussed here, Jmol model 0
has special meaning as select all models in Jmol. Only with my full-featured Python script, see below, do I address this limitation.
!curl -OL https://files.rcsb.org/download/1JOX.pdb.gz
!gunzip 1JOX.pdb.gz
!curl -OL https://files.rcsb.org/download/1G03.pdb.gz
!gunzip 1G03.pdb.gz
!curl -OL https://files.rcsb.org/download/1K8H.pdb.gz
!gunzip 1K8H.pdb.gz
!curl -OL https://files.rcsb.org/download/1g9e.pdb.gz
!gunzip 1g9e.pdb.gz
!curl -OL https://files.rcsb.org/download/1D3Z.pdb.gz
!gunzip 1D3Z.pdb.gz
!curl -OL https://files.rcsb.org/download/5ZUX.pdb.gz
!gunzip 5ZUX.pdb.gz
!mkdir pdbs
!mv 5ZUX.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6BA3.pdb.gz
!gunzip 6BA3.pdb.gz
!mv 6BA3.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6GDK.pdb.gz
!gunzip 6GDK.pdb.gz
!mv 6GDK.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6H1K.pdb.gz
!gunzip 6H1K.pdb.gz
!mv 6H1K.pdb pdbs/
!curl -OL https://files.rcsb.org/download/6EQY.pdb.gz
!gunzip 6EQY.pdb.gz
# Prepare a directory containing individual model files to use in merge example
!mkdir models
!curl -OL https://files.rcsb.org/download/1crn.pdb.gz
!gunzip 1crn.pdb.gz
!mv 1crn.pdb models/.
!curl -OL https://files.rcsb.org/download/1tup.pdb.gz
!gunzip 1tup.pdb.gz
!mv 1tup.pdb models/.
!curl -OL https://files.rcsb.org/download/1ehz.pdb.gz
!gunzip 1ehz.pdb.gz
!mv 1ehz.pdb models/.
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 269k 100 269k 0 0 606k 0 --:--:-- --:--:-- --:--:-- 606k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 694k 100 694k 0 0 1585k 0 --:--:-- --:--:-- --:--:-- 1581k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 382k 100 382k 0 0 974k 0 --:--:-- --:--:-- --:--:-- 974k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 575k 100 575k 0 0 1486k 0 --:--:-- --:--:-- --:--:-- 1482k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 230k 100 230k 0 0 663k 0 --:--:-- --:--:-- --:--:-- 663k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 900k 100 900k 0 0 2098k 0 --:--:-- --:--:-- --:--:-- 2093k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 429k 100 429k 0 0 738k 0 --:--:-- --:--:-- --:--:-- 738k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 772k 100 772k 0 0 1795k 0 --:--:-- --:--:-- --:--:-- 1791k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 163k 100 163k 0 0 528k 0 --:--:-- --:--:-- --:--:-- 528k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 445k 100 445k 0 0 1149k 0 --:--:-- --:--:-- --:--:-- 1146k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10699 100 10699 0 0 51936 0 --:--:-- --:--:-- --:--:-- 51936 % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 124k 100 124k 0 0 397k 0 --:--:-- --:--:-- --:--:-- 400k % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 47719 100 47719 0 0 162k 0 --:--:-- --:--:-- --:--:-- 162k
The code is:
i=1
while read -a line; do
echo "${line[@]}" >> model_${i}.pdb
[[ ${line[0]} == ENDMDL ]] && ((i++))
done
Example:
s='''i=1
while read -a line; do
echo "${line[@]}" >> model_${i}.pdb
[[ ${line[0]} == ENDMDL ]] && ((i++))
done'''
%store s > split_into_models.sh
#check file made
print ("\n")
!head split_into_models.sh
Writing 's' (str) to file 'split_into_models.sh'. i=1 while read -a line; do echo "${line[@]}" >> model_${i}.pdb [[ ${line[0]} == ENDMDL ]] && ((i++)) done
Run that:
(You'd leave out the !
if you were actually running this in a shell terminal.)
!bash split_into_models.sh < 1G03.pdb
(I originally used !bash split_into_models.sh < 1G03.pdb
; however, got read: Illegal option -a
. Switched to bash
per Biffen's comment here.)
That produces output files with names that start with model_
before the model number . I'd prefer to tag them with the PDB id, too. Like so:
!for file in model_*.pdb ; do mv "$file" "1G03${file}"; done
That could be done using purely Python with the following code:
tag_to_add ="1G03"
import os
import sys
import fnmatch
model_pattern = "model_*.pdb"
for file in os.listdir('.'):
if fnmatch.fnmatch(file, model_pattern):
os.rename(file, tag_to_add + file)
The code is:
BEGIN {file = 0; filename = "model_" file ".pdb"}
/ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
{print $0 > filename}
Example:
s='''BEGIN {file = 0; filename = "model_" file ".pdb"}
/ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"}
{print $0 > filename}'''
%store s > split_into_models.awk
#check file made
print ("\n")
!head split_into_models.awk
Writing 's' (str) to file 'split_into_models.awk'. BEGIN {file = 0; filename = "model_" file ".pdb"} /ENDMDL/ {getline; file ++; filename = "model_" file ".pdb"} {print $0 > filename}
Run that:
(You'd leave out the !
if you were actually running this in a shell terminal.)
!awk -f split_into_models.awk < 1JOX.pdb
That produces output files with names that start with model_
before the model number . I'd prefer to tag them with the PDB id, too. Like so:
!for file in model_*.pdb ; do mv "$file" "1JOX${file}"; done
That could be done using purely Python with the following code:
tag_to_add ="1JOX"
import os
import sys
import fnmatch
model_pattern = "model_*.pdb"
for file in os.listdir('.'):
if fnmatch.fnmatch(file, model_pattern):
os.rename(file, tag_to_add + file)
The code is:
grep -n 'MODEL\|ENDMDL' models.pdb | cut -d: -f 1 | \
awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp models.pdb > model_%03d.pdb\n", $1-1,NR/2;}' | bash -sf
Example:
!grep -n 'MODEL\|ENDMDL' 1K8H.pdb | cut -d: -f 1 | awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp 1K8H.pdb > 1K8Hmodel_%03d.pdb\n", $1-1,NR/2;}' | bash -sf
Unlike above, this time I edited the naming part of the command instead of changing from model_####.pdb
after it had run and produced that style of output.
(Note the first file it produced in this example doesn't look like it is an actual model file.)
%%perl
$base='1g9e';open(IN,"<$base.pdb");@indata = <IN>;$i=0;
foreach $line(@indata) {
if($line =~ /^MODEL/) {++$i;$file="${base}_$i.pdb";open(OUT,">$file");next}
if($line =~ /^ENDMDL/) {next}
if($line =~ /^ATOM/ || $line =~ /^HETATM/) {print OUT "$line"}
}
This version here is fleshed out & documented a bit more from the code that I posted this page. Note that it starts the model number at one which works better with Jmol, see above.
The main part of the code is:
PDB_text = """
PASTE YOUR PDB FILE TEXT HERE
"""
model_number = 1
new_file_text = ""
for line in filter(None, PDB_text.splitlines()):
line = line.strip () #for better control of ends of lines
if line == "ENDMDL":
# save file with file number in name
output_file = open("model_" + str(model_number) + ".pdb", "w")
output_file.write(new_file_text.rstrip('\r\n')) #rstrip to remove trailing newline
output_file.close()
# reset everything for next model
model_number += 1
new_file_text = ""
elif not line.startswith("MODEL"):
new_file_text += line + '\n'
It requires you to hand-edit the code to actually paste in the ENTIRE PDB file. I suggest skipping to the 'Python script method to split' below as it is more convenient. I am only posting this to show the underlying the process.
If you did prefer to use it, the following command would retrieve it into an active Jupyter session.
!curl -O https://raw.githubusercontent.com/fomightez/structurework/master/python_scripts/super_basic_multiple_model_PDB_file_splitter.py
Following retrieval, you'd open the file and paste your PDB file of interest in place of the text PASTE YOUR PDB FILE TEXT HERE
. And then run it. This command would be used on a generic command line.
python super_basic_multiple_model_PDB_file_splitter.py
You can either enter that proceeded by an exclamation point in a notebook, or use this command in a cell in a notebook:
%run super_basic_multiple_model_PDB_file_splitter.py
This script is more full featured and easier to use that the super basic python code version. However, the basic code above is a more concise representation of what goes on in the script.
It can be pointed at a directory and process all the files ending in '.pdb' or '.PDB' in that folder.
Note that it starts the model number at one which works better with Jmol, see above.
Example:
# retrieve the script
!curl -O https://raw.githubusercontent.com/fomightez/structurework/master/python_scripts/multiple_model_PDB_file_splitter.py
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 10343 100 10343 0 0 85479 0 --:--:-- --:--:-- --:--:-- 85479
# Run the script
%run multiple_model_PDB_file_splitter.py 1D3Z.pdb
Reading in your file... Concluded. File split into 10 models. Files with names '1D3Z_model_1.pdb', '1D3Z_model_2.pdb', etc., have been created in same directory as the input file.
You'd use the command below if you were working on an actual command line:
python multiple_model_PDB_file_splitter.py 1D3Z.pdb
Next is an example of having it act on a directory:
First we'll check what the directory looks like before running the script.
!ls pdbs
5ZUX.pdb 6BA3.pdb 6GDK.pdb 6H1K.pdb
To run the script targeting that directory, you'd issue the command:
%run multiple_model_PDB_file_splitter.py pdbs
Reading in your file... Concluded. File split into 20 models. Files with names 'pdbs/5ZUX_model_1.pdb', 'pdbs/5ZUX_model_2.pdb', etc., have been created in same directory as the input file. Reading in your file... Concluded. File split into 20 models. Files with names 'pdbs/6GDK_model_1.pdb', 'pdbs/6GDK_model_2.pdb', etc., have been created in same directory as the input file. Reading in your file... Concluded. File split into 10 models. Files with names 'pdbs/6H1K_model_1.pdb', 'pdbs/6H1K_model_2.pdb', etc., have been created in same directory as the input file. Reading in your file... Concluded. File split into 20 models. Files with names 'pdbs/6BA3_model_1.pdb', 'pdbs/6BA3_model_2.pdb', etc., have been created in same directory as the input file.
Confirming it worked.
!ls pdbs
5ZUX_model_10.pdb 5ZUX_model_9.pdb 6BA3_model_7.pdb 6GDK_model_5.pdb 5ZUX_model_11.pdb 5ZUX.pdb 6BA3_model_8.pdb 6GDK_model_6.pdb 5ZUX_model_12.pdb 6BA3_model_10.pdb 6BA3_model_9.pdb 6GDK_model_7.pdb 5ZUX_model_13.pdb 6BA3_model_11.pdb 6BA3.pdb 6GDK_model_8.pdb 5ZUX_model_14.pdb 6BA3_model_12.pdb 6GDK_model_10.pdb 6GDK_model_9.pdb 5ZUX_model_15.pdb 6BA3_model_13.pdb 6GDK_model_11.pdb 6GDK.pdb 5ZUX_model_16.pdb 6BA3_model_14.pdb 6GDK_model_12.pdb 6H1K_model_10.pdb 5ZUX_model_17.pdb 6BA3_model_15.pdb 6GDK_model_13.pdb 6H1K_model_1.pdb 5ZUX_model_18.pdb 6BA3_model_16.pdb 6GDK_model_14.pdb 6H1K_model_2.pdb 5ZUX_model_19.pdb 6BA3_model_17.pdb 6GDK_model_15.pdb 6H1K_model_3.pdb 5ZUX_model_1.pdb 6BA3_model_18.pdb 6GDK_model_16.pdb 6H1K_model_4.pdb 5ZUX_model_20.pdb 6BA3_model_19.pdb 6GDK_model_17.pdb 6H1K_model_5.pdb 5ZUX_model_2.pdb 6BA3_model_1.pdb 6GDK_model_18.pdb 6H1K_model_6.pdb 5ZUX_model_3.pdb 6BA3_model_20.pdb 6GDK_model_19.pdb 6H1K_model_7.pdb 5ZUX_model_4.pdb 6BA3_model_2.pdb 6GDK_model_1.pdb 6H1K_model_8.pdb 5ZUX_model_5.pdb 6BA3_model_3.pdb 6GDK_model_20.pdb 6H1K_model_9.pdb 5ZUX_model_6.pdb 6BA3_model_4.pdb 6GDK_model_2.pdb 6H1K.pdb 5ZUX_model_7.pdb 6BA3_model_5.pdb 6GDK_model_3.pdb 5ZUX_model_8.pdb 6BA3_model_6.pdb 6GDK_model_4.pdb
The next final section of this notebook will use these files as an example of how to easily package them up for downloading to your local machines. See Collect files for easy downloading.
This script takes a directory as input and makes a single multi-model PDB file of any files ending in '.pdb' or '.PDB' in that folder.
Example:
# retrieve the script
!curl -O https://raw.githubusercontent.com/fomightez/structurework/master/python_scripts/merge_multi_PDBs_into_single_file.py
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 12390 100 12390 0 0 87253 0 --:--:-- --:--:-- --:--:-- 87253
%run merge_multi_PDBs_into_single_file.py models
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6146. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6147. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6148. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain E is discontinuous at line 6149. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain F is discontinuous at line 6171. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6185. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6383. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6453. PDBConstructionWarning) Processing <Structure id=models/1ehz>; it will be model #1 Processing <Structure id=models/1crn>; it will be model #2 Processing <Structure id=models/1tup>; it will be model #3 The PDB-formatted file models.pdb has been created.
The next cell will further show that running that last cell resulted in creating a file with multiple models from three individual structure files.
!head models.pdb
MODEL 1 ATOM 1 OP3 G A 1 50.193 51.190 50.534 1.00 99.85 O ATOM 2 P G A 1 50.626 49.730 50.573 1.00100.19 P ATOM 3 OP1 G A 1 49.854 48.893 49.562 1.00100.19 O ATOM 4 OP2 G A 1 52.137 49.542 50.511 1.00 99.21 O ATOM 5 O5' G A 1 50.161 49.136 52.023 1.00 99.82 O ATOM 6 C5' G A 1 50.216 49.948 53.210 1.00 98.63 C ATOM 7 C4' G A 1 50.968 49.231 54.309 1.00 97.84 C ATOM 8 O4' G A 1 50.450 47.888 54.472 1.00 97.10 O ATOM 9 C3' G A 1 52.454 49.030 54.074 1.00 98.07 C
This script has two optional features that can be used:
These are each are demonstrated next.
Demonstrating customizing the starting number of models
You can specify a first model number using the --initial
option, abbreviated -i
, followed by the integer value to start with and then all subsequent models will be next in line following that.
%run merge_multi_PDBs_into_single_file.py models -i 23
/srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6146. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6147. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6148. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain E is discontinuous at line 6149. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain F is discontinuous at line 6171. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain A is discontinuous at line 6185. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain B is discontinuous at line 6383. PDBConstructionWarning) /srv/conda/lib/python3.6/site-packages/Bio/PDB/StructureBuilder.py:90: PDBConstructionWarning: WARNING: Chain C is discontinuous at line 6453. PDBConstructionWarning) Processing <Structure id=models/1ehz>; it will be model #23 Processing <Structure id=models/1crn>; it will be model #24 Processing <Structure id=models/1tup>; it will be model #25 The PDB-formatted file models.pdb has been created.
The next line will show that the option worked to number the starting model as 23.
!head models.pdb
MODEL 23 ATOM 1 OP3 G A 1 50.193 51.190 50.534 1.00 99.85 O ATOM 2 P G A 1 50.626 49.730 50.573 1.00100.19 P ATOM 3 OP1 G A 1 49.854 48.893 49.562 1.00100.19 O ATOM 4 OP2 G A 1 52.137 49.542 50.511 1.00 99.21 O ATOM 5 O5' G A 1 50.161 49.136 52.023 1.00 99.82 O ATOM 6 C5' G A 1 50.216 49.948 53.210 1.00 98.63 C ATOM 7 C4' G A 1 50.968 49.231 54.309 1.00 97.84 C ATOM 8 O4' G A 1 50.450 47.888 54.472 1.00 97.10 O ATOM 9 C3' G A 1 52.454 49.030 54.074 1.00 98.07 C
Zero cannot be used in the script. You'd have to adjust them after if you needed that value in the file name.
Demonstrating ordering the models
If you add numbers to the file names you can specify the order of the files in the model. For example, above 1ehz.pdb
was used as the first model by default. If that had been intended to be the third model, you could change the file names of the pdb files to be like this:
1crn_3.pdb
1tup_5.pdb
1ehz_7.pdb
The specific numbers don't matter. The lowest are first and the highest numbered will be last.
I'll leave that exercise to the user and instead demonstrate the script further with a larger group of files.
A merge based on that pattern of files names will be demonstrated by running the next two cells. First, we'll prepare a new version of the models directory using the split methods from above to make files with names matching that pattern and then run the merge script.
# Prepare a directory containing individual model files to use.
# First clear current `models` content.
!rm -rf models
# now prepare the directory with a new listing of files
!grep -n 'MODEL\|ENDMDL' 1G03.pdb | cut -d: -f 1 | awk '{if(NR%2) printf "sed -n %d,",$1+1; else printf "%dp 1G03.pdb > model_%01d.pdb\n", $1-1,NR/2;}' | bash -sf
!rm model_1.pdb # like in earlier demo above, first resulting supposed 'model' is just part of header and so delete
!mkdir models
!mv model_*.pdb models/
!ls models/
model_10.pdb model_14.pdb model_18.pdb model_2.pdb model_6.pdb model_11.pdb model_15.pdb model_19.pdb model_3.pdb model_7.pdb model_12.pdb model_16.pdb model_20.pdb model_4.pdb model_8.pdb model_13.pdb model_17.pdb model_21.pdb model_5.pdb model_9.pdb
(Note that because the first generated 'model', model_1.pdb
, was just made of part of the header it was deleted as part of the preparation.)
%run merge_multi_PDBs_into_single_file.py models
Processing <Structure id=models/model_2>; it will be model #1 Processing <Structure id=models/model_3>; it will be model #2 Processing <Structure id=models/model_4>; it will be model #3 Processing <Structure id=models/model_5>; it will be model #4 Processing <Structure id=models/model_6>; it will be model #5 Processing <Structure id=models/model_7>; it will be model #6 Processing <Structure id=models/model_8>; it will be model #7 Processing <Structure id=models/model_9>; it will be model #8 Processing <Structure id=models/model_10>; it will be model #9 Processing <Structure id=models/model_11>; it will be model #10 Processing <Structure id=models/model_12>; it will be model #11 Processing <Structure id=models/model_13>; it will be model #12 Processing <Structure id=models/model_14>; it will be model #13 Processing <Structure id=models/model_15>; it will be model #14 Processing <Structure id=models/model_16>; it will be model #15 Processing <Structure id=models/model_17>; it will be model #16 Processing <Structure id=models/model_18>; it will be model #17 Processing <Structure id=models/model_19>; it will be model #18 Processing <Structure id=models/model_20>; it will be model #19 Processing <Structure id=models/model_21>; it will be model #20 The PDB-formatted file models.pdb has been created.
!head models.pdb
MODEL 1 ATOM 1 N PRO A 1 4.524 9.887 -0.667 1.00 0.00 N ATOM 2 CA PRO A 1 5.918 10.123 -0.175 1.00 0.00 C ATOM 3 C PRO A 1 5.865 10.943 1.122 1.00 0.00 C ATOM 4 O PRO A 1 5.284 12.009 1.177 1.00 0.00 O ATOM 5 CB PRO A 1 6.697 10.871 -1.278 1.00 0.00 C ATOM 6 CG PRO A 1 5.715 11.124 -2.430 1.00 0.00 C ATOM 7 CD PRO A 1 4.374 10.484 -2.030 1.00 0.00 C ATOM 8 H PRO A 1 4.341 8.864 -0.711 1.00 0.00 H ATOM 9 H3 PRO A 1 3.846 10.334 -0.018 1.00 0.00 H
The lowest numbered model was used first and then the numbers increased to match from there.
Note that all need to match the pattern of having an integer after an underscore and before the .pdb
or .PDB
. If a single one doesn't match, the order will just be based on the order the files happened to get processed in. For example, if we change one of the file names to not have a number, we can see the processing won't be as ordered in the series like it was in the last run of the script above.
#break it by changing lowest numbered one to have a letter instead of a number.
!mv models/model_2.pdb models/model_a.pdb
%run merge_multi_PDBs_into_single_file.py models
Processing <Structure id=models/model_11>; it will be model #1 Processing <Structure id=models/model_15>; it will be model #2 Processing <Structure id=models/model_5>; it will be model #3 Processing <Structure id=models/model_12>; it will be model #4 Processing <Structure id=models/model_7>; it will be model #5 Processing <Structure id=models/model_a>; it will be model #6 Processing <Structure id=models/model_13>; it will be model #7 Processing <Structure id=models/model_4>; it will be model #8 Processing <Structure id=models/model_10>; it will be model #9 Processing <Structure id=models/model_17>; it will be model #10 Processing <Structure id=models/model_3>; it will be model #11 Processing <Structure id=models/model_19>; it will be model #12 Processing <Structure id=models/model_9>; it will be model #13 Processing <Structure id=models/model_20>; it will be model #14 Processing <Structure id=models/model_21>; it will be model #15 Processing <Structure id=models/model_16>; it will be model #16 Processing <Structure id=models/model_6>; it will be model #17 Processing <Structure id=models/model_8>; it will be model #18 Processing <Structure id=models/model_18>; it will be model #19 Processing <Structure id=models/model_14>; it will be model #20 The PDB-formatted file models.pdb has been created.
See the order of the files merged into the model file was much more haphazard.
Let's fix it and further demonstrate it isn't the specific number that matters, but the order. !mv models/model_a.pdb models/model_0.pdb
# Here will set the one file number lower than what it was originally.
!mv models/model_a.pdb models/model_0.pdb
%run merge_multi_PDBs_into_single_file.py models
Processing <Structure id=models/model_0>; it will be model #1 Processing <Structure id=models/model_3>; it will be model #2 Processing <Structure id=models/model_4>; it will be model #3 Processing <Structure id=models/model_5>; it will be model #4 Processing <Structure id=models/model_6>; it will be model #5 Processing <Structure id=models/model_7>; it will be model #6 Processing <Structure id=models/model_8>; it will be model #7 Processing <Structure id=models/model_9>; it will be model #8 Processing <Structure id=models/model_10>; it will be model #9 Processing <Structure id=models/model_11>; it will be model #10 Processing <Structure id=models/model_12>; it will be model #11 Processing <Structure id=models/model_13>; it will be model #12 Processing <Structure id=models/model_14>; it will be model #13 Processing <Structure id=models/model_15>; it will be model #14 Processing <Structure id=models/model_16>; it will be model #15 Processing <Structure id=models/model_17>; it will be model #16 Processing <Structure id=models/model_18>; it will be model #17 Processing <Structure id=models/model_19>; it will be model #18 Processing <Structure id=models/model_20>; it will be model #19 Processing <Structure id=models/model_21>; it will be model #20 The PDB-formatted file models.pdb has been created.
In fact, negative values can even work. (Although best avoided in general to make parsing file names more robust.)
!mv models/model_0.pdb models/model_-1.pdb
%run merge_multi_PDBs_into_single_file.py models
Processing <Structure id=models/model_-1>; it will be model #1 Processing <Structure id=models/model_3>; it will be model #2 Processing <Structure id=models/model_4>; it will be model #3 Processing <Structure id=models/model_5>; it will be model #4 Processing <Structure id=models/model_6>; it will be model #5 Processing <Structure id=models/model_7>; it will be model #6 Processing <Structure id=models/model_8>; it will be model #7 Processing <Structure id=models/model_9>; it will be model #8 Processing <Structure id=models/model_10>; it will be model #9 Processing <Structure id=models/model_11>; it will be model #10 Processing <Structure id=models/model_12>; it will be model #11 Processing <Structure id=models/model_13>; it will be model #12 Processing <Structure id=models/model_14>; it will be model #13 Processing <Structure id=models/model_15>; it will be model #14 Processing <Structure id=models/model_16>; it will be model #15 Processing <Structure id=models/model_17>; it will be model #16 Processing <Structure id=models/model_18>; it will be model #17 Processing <Structure id=models/model_19>; it will be model #18 Processing <Structure id=models/model_20>; it will be model #19 Processing <Structure id=models/model_21>; it will be model #20 The PDB-formatted file models.pdb has been created.
To make this example easier, we'll first change directories into the directory where we last split multi-model files into individual models.
%cd pdbs
!ls
/home/jovyan/pdbs 5ZUX_model_10.pdb 5ZUX_model_9.pdb 6BA3_model_7.pdb 6GDK_model_5.pdb 5ZUX_model_11.pdb 5ZUX.pdb 6BA3_model_8.pdb 6GDK_model_6.pdb 5ZUX_model_12.pdb 6BA3_model_10.pdb 6BA3_model_9.pdb 6GDK_model_7.pdb 5ZUX_model_13.pdb 6BA3_model_11.pdb 6BA3.pdb 6GDK_model_8.pdb 5ZUX_model_14.pdb 6BA3_model_12.pdb 6GDK_model_10.pdb 6GDK_model_9.pdb 5ZUX_model_15.pdb 6BA3_model_13.pdb 6GDK_model_11.pdb 6GDK.pdb 5ZUX_model_16.pdb 6BA3_model_14.pdb 6GDK_model_12.pdb 6H1K_model_10.pdb 5ZUX_model_17.pdb 6BA3_model_15.pdb 6GDK_model_13.pdb 6H1K_model_1.pdb 5ZUX_model_18.pdb 6BA3_model_16.pdb 6GDK_model_14.pdb 6H1K_model_2.pdb 5ZUX_model_19.pdb 6BA3_model_17.pdb 6GDK_model_15.pdb 6H1K_model_3.pdb 5ZUX_model_1.pdb 6BA3_model_18.pdb 6GDK_model_16.pdb 6H1K_model_4.pdb 5ZUX_model_20.pdb 6BA3_model_19.pdb 6GDK_model_17.pdb 6H1K_model_5.pdb 5ZUX_model_2.pdb 6BA3_model_1.pdb 6GDK_model_18.pdb 6H1K_model_6.pdb 5ZUX_model_3.pdb 6BA3_model_20.pdb 6GDK_model_19.pdb 6H1K_model_7.pdb 5ZUX_model_4.pdb 6BA3_model_2.pdb 6GDK_model_1.pdb 6H1K_model_8.pdb 5ZUX_model_5.pdb 6BA3_model_3.pdb 6GDK_model_20.pdb 6H1K_model_9.pdb 5ZUX_model_6.pdb 6BA3_model_4.pdb 6GDK_model_2.pdb 6H1K.pdb 5ZUX_model_7.pdb 6BA3_model_5.pdb 6GDK_model_3.pdb 5ZUX_model_8.pdb 6BA3_model_6.pdb 6GDK_model_4.pdb
Now we can package up the individual models into one easy to download archive with commands like these:
!tar czf 5ZUX_chains.tar.gz 5ZUX_model_*.pdb
!tar czf 6BA3_chains.tar.gz 6BA3_model_*.pdb
!tar czf 6GDK_chains.tar.gz 6GDK_model_*.pdb
!tar czf 6H1K_chains.tar.gz 6H1K_model_*.pdb
Verify it worked by viewing the list of the files in the directory.
!ls
5ZUX_chains.tar.gz 5ZUX_model_9.pdb 6BA3_model_7.pdb 6GDK_model_5.pdb 5ZUX_model_10.pdb 5ZUX.pdb 6BA3_model_8.pdb 6GDK_model_6.pdb 5ZUX_model_11.pdb 6BA3_chains.tar.gz 6BA3_model_9.pdb 6GDK_model_7.pdb 5ZUX_model_12.pdb 6BA3_model_10.pdb 6BA3.pdb 6GDK_model_8.pdb 5ZUX_model_13.pdb 6BA3_model_11.pdb 6GDK_chains.tar.gz 6GDK_model_9.pdb 5ZUX_model_14.pdb 6BA3_model_12.pdb 6GDK_model_10.pdb 6GDK.pdb 5ZUX_model_15.pdb 6BA3_model_13.pdb 6GDK_model_11.pdb 6H1K_chains.tar.gz 5ZUX_model_16.pdb 6BA3_model_14.pdb 6GDK_model_12.pdb 6H1K_model_10.pdb 5ZUX_model_17.pdb 6BA3_model_15.pdb 6GDK_model_13.pdb 6H1K_model_1.pdb 5ZUX_model_18.pdb 6BA3_model_16.pdb 6GDK_model_14.pdb 6H1K_model_2.pdb 5ZUX_model_19.pdb 6BA3_model_17.pdb 6GDK_model_15.pdb 6H1K_model_3.pdb 5ZUX_model_1.pdb 6BA3_model_18.pdb 6GDK_model_16.pdb 6H1K_model_4.pdb 5ZUX_model_20.pdb 6BA3_model_19.pdb 6GDK_model_17.pdb 6H1K_model_5.pdb 5ZUX_model_2.pdb 6BA3_model_1.pdb 6GDK_model_18.pdb 6H1K_model_6.pdb 5ZUX_model_3.pdb 6BA3_model_20.pdb 6GDK_model_19.pdb 6H1K_model_7.pdb 5ZUX_model_4.pdb 6BA3_model_2.pdb 6GDK_model_1.pdb 6H1K_model_8.pdb 5ZUX_model_5.pdb 6BA3_model_3.pdb 6GDK_model_20.pdb 6H1K_model_9.pdb 5ZUX_model_6.pdb 6BA3_model_4.pdb 6GDK_model_2.pdb 6H1K.pdb 5ZUX_model_7.pdb 6BA3_model_5.pdb 6GDK_model_3.pdb 5ZUX_model_8.pdb 6BA3_model_6.pdb 6GDK_model_4.pdb
List of the files produced from the commands just above:
5ZUX_chains.tar.gz
6BA3_chains.tar.gz
6GDK_chains.tar.gz
6H1K_chains.tar.gz
Download the gzipped tarballed archives produced to your local machine.