In [1]:
import traceback

VisTrails API example

This notebook showcases the new API. Inlined are some comments on design decisions.

In [2]:
import vistrails as vt

There are few reasons for the API not to be under the top-level vistrails package. They are:

  • Versioning; rolling out a new API would be difficult (this is the third attempt at an API already).
  • Weight; the top-level package is always imported even if VisTrails is not to be used programmatically. This is a non-issue if the API doesn't do any initialization before the first API call, and provided that importing stuff from the rest of VisTrails doesn't have harmful side-effects.

Vistrails and Pipelines

Vistrails objects are currently obtained through load_vistrail(), although they can be constructed from an existing Pipeline or VistrailControllerinternal.

Question: should we allow Vistrail('path/to/file') as well?

In [3]:
vistrail = vt.load_vistrail('examples/simplemath.vt')

A Vistrail is basically a controller. From it we can get Pipelines, but it is also stateful (i.e. has a current version); this is useful for editing (creating new versions from the current one). It also provides the interface that Pipeline has, implicitely acting on the current_pipeline.

Problem: there are issues with upgrades; get_pipeline() can return non-upgraded pipelines and this is bad. vistrail.get_pipeline(vistrail.current_version) will return the non-upgraded thing which is unexpected.

In [4]:
vistrail
Out[4]:
G 0 28 Added annotation 0->28
<Vistrail: simplemath.vt, version -1, not changed>
In [5]:
vistrail.select_latest_version()
In [6]:
vistrail
Out[6]:
G 0 28 Added annotation 0->28
<Vistrail: simplemath.vt, version 28, not changed>
In [7]:
vistrail.get_pipeline(2)
Out[7]:
_anonymous_0 module0 PythonCalc
<Pipeline: 1 modules, 0 connections>

Packages

Only basic_modules (and abstractions?) are loaded on initialization, so that using the API stays fast. A package might be auto-enabled when it is requested, which is efficient and convenient.

load_package() only uses package identifiers (although we could add versions specifiers?), I don't think we want to worry about names/codepaths.

In [8]:
tabledata = vt.load_package('org.vistrails.vistrails.tabledata')
tabledata
Out[8]:
<Package: org.vistrails.vistrails.tabledata, 23 modules>

You can get Modules from the package using the dot or bracket syntax. These modules are "dangling" modules, not yet instanciated in a specific pipeline/vistrail.

I chose not to make a distinction between module descriptors and pipeline modules (module descriptors are just modules that are not yet connected to a pipeline) to simplify things and keep the number of concepts low.

In [9]:
tabledata.convert
Out[9]:
<Namespace convert of package org.vistrails.vistrails.tabledata>
In [10]:
from vistrails.core.modules.module_registry import MissingModule
try:
    tabledata['convert']  # can't get namespaces this way, use a dot
except MissingModule:
    pass
else:
    assert False
In [11]:
tabledata.BuildTable, tabledata['BuildTable']
Out[11]:
(vistrails.core.api.BuildTable, vistrails.core.api.BuildTable)
In [12]:
tabledata.read.CSVFile, tabledata['read|CSVFile']
Out[12]:
(vistrails.core.api.CSVFile, vistrails.core.api.CSVFile)

(note: IPython bug 6709 causes the 'vistrails.core.api.' prefixes above)

Pipeline manipulation

Work in progress...

Execution

In addition to executing a Pipeline or Vistrail, I want to be able to easily pass values in on InputPort modules (to use subworkflows as Python functions) and get results out (either on OutputPort modules or any port of any module).

Execution returns a Results object from which you can get all of this, and that would be integrated with IPython to inline images and objects that support it (matplotlib, ...).

Gets output

In [13]:
outputs = vt.load_vistrail('examples/outputs.vt')
outputs.select_version(1)
outputs
Out[13]:
G 0 1 Added module 0->1 5 Added parameter 1->5
<Vistrail: outputs.vt, version 1, not changed>
In [14]:
# Errors
try:
    result = outputs.execute()
except vt.ExecutionErrors:
    traceback.print_exc()
else:
    assert False
Traceback (most recent call last):
  File "<ipython-input-14-979bf6416e43>", line 3, in <module>
    result = outputs.execute()
  File "vistrails\core\api.py", line 205, in execute
    return self.current_pipeline.execute(*args, **kwargs)
  File "vistrails\core\api.py", line 424, in execute
    raise ExecutionErrors(self, result)
ExecutionErrors: Pipeline execution failed: 1 error:
0: Missing value from port value
In [15]:
# Results
outputs.select_latest_version()
result = outputs.execute()
result
Out[15]:
<ExecutionResult: 2 modules>
In [16]:
outputs
Out[16]:
G 0 5 Added parameter 0->5
<Vistrail: outputs.vt, version 5, changed>
In [17]:
outputs.current_pipeline
Out[17]:
_anonymous_0 module0 String value module1 InternalPipe OutputPort module0:out0->module1:in0
<Pipeline: 2 modules, 1 connections; outputs: msg>
In [18]:
result.module_output(0)
Out[18]:
{'self': <vistrails.core.modules.basic_modules.String at 0x657b8b0>,
 'value': 'Hello, world',
 'value_as_string': 'Hello, world'}
In [19]:
result.output_port('msg')
Out[19]:
'Hello, world'

Sets inputs

In [20]:
pipeline = vistrail.current_pipeline
pipeline
Out[20]:
_anonymous_0 module0 value2 value1 + value module4 InternalPipe OutputPort module0:out0->module4:in0 module1 First input InternalPipe module1:out0->module0:in1 module3 value2 value1 * value module1:out0->module3:in1 module2 Second input InternalPipe module2:out0->module0:in0 module2:out0->module3:in0 module5 InternalPipe OutputPort module3:out0->module5:in0
<Pipeline: 6 modules, 6 connections; inputs: in_a, in_b; outputs: out_times, out_plus>
In [21]:
in_a = pipeline.get_input('in_a')
assert (in_a == pipeline.get_module('First input')) is True
in_a
Out[21]:
<Module 'InputPort' from org.vistrails.vistrails.basic, id 1, label "First input">
In [22]:
result = pipeline.execute(in_a == 2, in_b=4)
In [23]:
result.output_port('out_times'), result.output_port('out_plus')
Out[23]:
(8.0, 6.0)

Other example

In [24]:
im = vt.load_vistrail('examples/imagemagick.vt')
In [25]:
im.select_version('read')
im
Out[25]:
G 0 9 read 0->9 16 blur 9->16 23 edges 9->23
<Vistrail: imagemagick.vt, version 9 (tag read), not changed>
In [26]:
im.execute().output_port('result')
Out[26]:
In [27]:
im.select_version('blur')
im
Out[27]:
G 0 9 read 0->9 16 blur 9->16 23 edges 9->23
<Vistrail: imagemagick.vt, version 16 (tag blur), changed>
In [28]:
im.execute().output_port('result')
Out[28]:
In [29]:
im.select_version('edges')
im.execute().output_port('result')
Out[29]:
In [29]: