BioBlend is a python libary for Galaxy toolkits. It enables high level developments for deploying and deleting cloud instances for Galaxy in python. Furthermore, it supports load and execute workflows on python in a simple format.
BioBlend: https://github.com/afgane/bioblend
An example for using BioBlend:
from bioblend.galaxy import GalaxyInstance
gi = GalaxyInstance('<Galaxy IP>', key='your API key')
libs = gi.libraries.get_libraries()
gi.workflows.show_workflow('workflow ID')
gi.workflows.run_workflow('workflow ID', input_dataset_map)
This tutorial shows how to use the BioBlend library with workflow examples so that python developers can use Galaxy tools on python and Ipython.
This step connects to a galaxy server using BioBlend. A server url and an api key are required to connect.
The server url is a location for the installed galaxy server and the api key is an identification of a user.
We have installed a galaxy on a local machine with ipython. So, the galaxy url should be on a local ip address and a default port number 8080, e.g. http://127.0.0.1:8080.
If you want to use a different galaxy server e.g. the public galaxy server hosted by Penn State University, use https://main.g2.bx.psu.edu/.
For the galaxy_api_key, a galaxy user needs to get the string from here: http://%5Bgalaxy_server%5D/user/api_keys?cntrller=user
It is like a password, so only logged in users can obtain the key. Here, I used the key d8699f27a08cc6f42a065e39955b6c47
for my account on the local galaxy server.
from bioblend.galaxy import GalaxyInstance
galaxy_url = "http://127.0.0.1:8080"
galaxy_api_key = "d8699f27a08cc6f42a065e39955b6c47"
gi = GalaxyInstance(url=galaxy_url, key=galaxy_api_key)
Once the connection is successfully established, obtaining galaxy histories is a good example to test it is working.
get_histories() returns a list of a current history for a logged in user.
hl = gi.histories.get_histories()
hl
[{u'deleted': False, u'id': u'df7a1f0c02a5b08e', u'model_class': u'History', u'name': u'Unnamed history', u'published': True, u'tags': [], u'url': u'/api/histories/df7a1f0c02a5b08e'}, {u'deleted': False, u'id': u'5969b1f7201f12ae', u'model_class': u'History', u'name': u'Unnamed history', u'published': False, u'tags': [], u'url': u'/api/histories/5969b1f7201f12ae'}, {u'deleted': False, u'id': u'a799d38679e985db', u'model_class': u'History', u'name': u'Unnamed history', u'published': False, u'tags': [], u'url': u'/api/histories/a799d38679e985db'}, {u'deleted': False, u'id': u'33b43b4e7093c91f', u'model_class': u'History', u'name': u'Unnamed history', u'published': False, u'tags': [], u'url': u'/api/histories/33b43b4e7093c91f'}, {u'deleted': False, u'id': u'ebfb8f50c6abde6d', u'model_class': u'History', u'name': u'Unnamed history', u'published': False, u'tags': [], u'url': u'/api/histories/ebfb8f50c6abde6d'}]
Note.
Galaxy has analytical tools based on Python. Each one of them has an id. For example, CONVERTER_interval_to_bedstrict_0
The JSON Workflow file has section name "tool_id" for the id. e.g. https://gist.github.com/lee212/f1449352334a2268b849
BioBlend supports basic functions to load and run workflows. get_workflows() returns a list of workflows that a galaxy user has.
workflows = gi.workflows.get_workflows()
workflows
[{u'id': u'f597429621d6eb2b', u'model_class': u'StoredWorkflow', u'name': u"Workflow constructed from history 'Unnamed history'", u'published': True, u'tags': [], u'url': u'/api/workflows/f597429621d6eb2b'}, {u'id': u'1cd8e2f6b131e891', u'model_class': u'StoredWorkflow', u'name': u'Galaxy 101 (imported from uploaded file)', u'published': False, u'tags': [], u'url': u'/api/workflows/1cd8e2f6b131e891'}]
There are two workflows stored in the database. Let's select the second workflow named 'Galaxy 101' and see what components it has.
show_workflow() returns detailed information about a workflow such as an id and inputs.
workflow = workflows[1]
res = gi.workflows.show_workflow(workflow['id'])
res
{u'id': u'1cd8e2f6b131e891', u'inputs': {u'29': {u'label': u'Features', u'value': u''}, u'30': {u'label': u'Exons', u'value': u''}}, u'model_class': u'StoredWorkflow', u'name': u'Galaxy 101 (imported from uploaded file)', u'published': False, u'steps': {u'24': {u'id': 24, u'input_steps': {u'input1': {u'source_step': 30, u'step_output': u'output'}, u'input2': {u'source_step': 29, u'step_output': u'output'}}, u'tool_id': u'gops_join_1', u'type': u'tool'}, u'25': {u'id': 25, u'input_steps': {u'input': {u'source_step': 26, u'step_output': u'out_file1'}}, u'tool_id': u'sort1', u'type': u'tool'}, u'26': {u'id': 26, u'input_steps': {u'input1': {u'source_step': 24, u'step_output': u'output'}}, u'tool_id': u'Grouping1', u'type': u'tool'}, u'27': {u'id': 27, u'input_steps': {u'input1': {u'source_step': 30, u'step_output': u'output'}, u'input2': {u'source_step': 28, u'step_output': u'out_file1'}}, u'tool_id': u'comp1', u'type': u'tool'}, u'28': {u'id': 28, u'input_steps': {u'input': {u'source_step': 25, u'step_output': u'out_file1'}}, u'tool_id': u'Show beginning1', u'type': u'tool'}, u'29': {u'id': 29, u'input_steps': {}, u'tool_id': None, u'type': u'data_input'}, u'30': {u'id': 30, u'input_steps': {}, u'tool_id': None, u'type': u'data_input'}}, u'tags': [], u'url': u'/api/workflows/1cd8e2f6b131e891'}
run_workflow() executes the workflow with an input dataset into a selected history.
It returns output dataset IDs which indicate the results of each step in the workflow.
dataset_map = {'30':{'id':'cbbbf59e8f08c98c','src':'hda'}, \
'29': {'id': '964b37715ec9bd22', 'src': 'hda' }}
outputs = gi.workflows.run_workflow(workflow['id'], dataset_map, history_id='df7a1f0c02a5b08e')#history_name='test1withhda')
{u'history': u'df7a1f0c02a5b08e', u'outputs': [u'6fc9fbb81c497f69', u'6fb17d0cc6e8fae5', u'5114a2a207b7caff', u'06ec17aefa2d49dd', u'b8a0d6158b9961df']}
There are two input datasets used and one of them is 'UCSC Main on Human: knownGene (chr22:1-51304566)'.
Its id 'cbbbf59e8f08c98c' displays detailed information for the input dataset.
dataset = gi.datasets.show_dataset('cbbbf59e8f08c98c')
dataset
{u'accessible': True, u'api_type': u'file', u'data_type': u'bed', u'deleted': False, u'display_apps': [{u'label': u'display in IGB', u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/igb_bed/Local', u'target': u'_blank', u'text': u'Local'}, {u'href': u'/display_application/cbbbf59e8f08c98c/igb_bed/Web', u'target': u'_blank', u'text': u'Web'}]}, {u'label': u'display at Ensembl', u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/ensembl_interval/ensembl_Current', u'target': u'_blank', u'text': u'Current'}]}, {u'label': u'display at RViewer', u'links': [{u'href': u'/display_application/cbbbf59e8f08c98c/rviewer_interval/lbl_main', u'target': u'_blank', u'text': u'main'}]}], u'display_types': [], u'download_url': u'/api/histories/df7a1f0c02a5b08e/contents/cbbbf59e8f08c98c/display', u'file_ext': u'bed', u'file_size': 797714, u'genome_build': u'hg19', u'hda_ldda': u'hda', u'hid': 1, u'history_id': u'df7a1f0c02a5b08e', u'id': u'cbbbf59e8f08c98c', u'metadata_chromCol': 1, u'metadata_column_names': None, u'metadata_column_types': [u'str', u'int', u'int', u'str', u'int', u'str'], u'metadata_columns': 6, u'metadata_comment_lines': None, u'metadata_data_lines': 12410, u'metadata_dbkey': u'hg19', u'metadata_endCol': 3, u'metadata_nameCol': 4, u'metadata_startCol': 2, u'metadata_strandCol': 6, u'metadata_viz_filter_cols': [4], u'misc_blurb': u'12,410 regions', u'misc_info': u'', u'model_class': u'HistoryDatasetAssociation', u'name': u'UCSC Main on Human: knownGene (chr22:1-51304566)', u'peek': u'<table cellspacing="0" cellpadding="3"><tr><th>1.Chrom</th><th>2.Start</th><th>3.End</th><th>4.Name</th><th>5</th><th>6.Strand</th></tr><tr><td>chr22</td><td>16258185</td><td>16258303</td><td>uc002zlh.1_cds_1_0_chr22_16258186_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16266928</td><td>16267095</td><td>uc002zlh.1_cds_2_0_chr22_16266929_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16268136</td><td>16268181</td><td>uc002zlh.1_cds_3_0_chr22_16268137_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16269872</td><td>16269943</td><td>uc002zlh.1_cds_4_0_chr22_16269873_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16275206</td><td>16275277</td><td>uc002zlh.1_cds_5_0_chr22_16275207_r</td><td>0</td><td>-</td></tr><tr><td>chr22</td><td>16277747</td><td>16277885</td><td>uc002zlh.1_cds_6_0_chr22_16277748_r</td><td>0</td><td>-</td></tr></table>', u'purged': False, u'state': u'ok', u'uuid': None, u'visible': True, u'visualizations': [u'trackster', u'circster', u'scatterplot']}
from IPython.core.display import HTML
merged_htmls = ""
for output in outputs['outputs']:
dataset = gi.datasets.show_dataset(output)
#pprint.pprint(dataset)
name = dataset['name']
html = dataset['peek']
merged_htmls += "<p><b>%s</b>" % name + html + "</p>"
HTML(merged_htmls)
Join on data 2 and data 1
1.Chrom | 2.Start | 3.End | 4.Name | 5 | 6.Strand | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|---|---|
chr22 | 16258185 | 16258303 | uc002zlh.1_cds_1_0_chr22_16258186_r | 0 | - | chr22 | 16258278 | 16258279 | rs2845178 | 0 | + |
chr22 | 16266928 | 16267095 | uc002zlh.1_cds_2_0_chr22_16266929_r | 0 | - | chr22 | 16267031 | 16267032 | rs7292200 | 0 | + |
chr22 | 16266928 | 16267095 | uc002zlh.1_cds_2_0_chr22_16266929_r | 0 | - | chr22 | 16266963 | 16266964 | rs10154680 | 0 | + |
chr22 | 16266928 | 16267095 | uc002zlh.1_cds_2_0_chr22_16266929_r | 0 | - | chr22 | 16267011 | 16267012 | rs7290262 | 0 | + |
chr22 | 16266928 | 16267095 | uc002zlh.1_cds_2_0_chr22_16266929_r | 0 | - | chr22 | 16267037 | 16267038 | rs2818572 | 0 | + |
chr22 | 16269872 | 16269943 | uc002zlh.1_cds_4_0_chr22_16269873_r | 0 | - | chr22 | 16269933 | 16269934 | rs2845206 | 0 | + |
Group on data 8
1 | 2 |
---|---|
uc002zlh.1_cds_1_0_chr22_16258186_r | 1 |
uc002zlh.1_cds_2_0_chr22_16266929_r | 4 |
uc002zlh.1_cds_4_0_chr22_16269873_r | 1 |
uc002zlh.1_cds_5_0_chr22_16275207_r | 2 |
uc002zlh.1_cds_6_0_chr22_16277748_r | 5 |
uc002zlh.1_cds_7_0_chr22_16279195_r | 2 |
Sort on data 9
1 | 2 |
---|---|
uc010gsw.2_cds_1_0_chr22_21480537_r | 67 |
uc021wmb.1_cds_0_0_chr22_21480537_r | 67 |
uc002zoc.3_cds_0_0_chr22_18834445_f | 58 |
uc021wnd.1_cds_0_0_chr22_24647973_f | 50 |
uc021wmc.1_cds_0_0_chr22_21637809_f | 47 |
uc003bhh.3_cds_0_0_chr22_46652458_r | 46 |
Select first on data 10
1 | 2 |
---|---|
uc010gsw.2_cds_1_0_chr22_21480537_r | 67 |
uc021wmb.1_cds_0_0_chr22_21480537_r | 67 |
uc002zoc.3_cds_0_0_chr22_18834445_f | 58 |
uc021wnd.1_cds_0_0_chr22_24647973_f | 50 |
uc021wmc.1_cds_0_0_chr22_21637809_f | 47 |
top 5 exons
1.Chrom | 2.Start | 3.End | 4.Name | 5 | 6.Strand |
---|---|---|---|---|---|
chr22 | 18834444 | 18835833 | uc002zoc.3_cds_0_0_chr22_18834445_f | 0 | + |
chr22 | 21480536 | 21481925 | uc010gsw.2_cds_1_0_chr22_21480537_r | 0 | - |
chr22 | 21480536 | 21481925 | uc021wmb.1_cds_0_0_chr22_21480537_r | 0 | - |
chr22 | 21637808 | 21638558 | uc021wmc.1_cds_0_0_chr22_21637809_f | 0 | + |
chr22 | 24647972 | 24649256 | uc021wnd.1_cds_0_0_chr22_24647973_f | 0 | + |
[comment]: <> (<!---
Plans
--> )We successfully executed the workflow on Python with BioBlend and displayed the results on IPython.